Currently, this process is (mostly) serialized due to the fact that Jenkins is configured to only run one build of each job at a time. This serialized aspect of trunk gating is desirable -- it means that each change is tested exactly as it will be eventually merged into the repository. For example, change A will be tested against HEAD, then merged, then change B will be tested against HEAD, which now includes change A. If we allowed those jobs to run in parallel, it may be that change A introduces a condition that causes change B to fail, but without testing B against A, we would not detect it until after the change is merged. Strict serialization of testing and merging changes is therefore useful.
However, a problem arises as the tests become longer or the rate of changes increases. If a given test takes, say, one hour (which is entirely reasonable for some kinds of tests), then the entire project can only merge, at most, 24 changes each day. That is the very definition of un-scalable, and quite inconvenient for developers too, who may have to wait a very long time for the tree to change.
When processor designers hit the wall for how fast a processor could execute instructions, they branched out, so to speak. Taking a page from processor design, I have written a program that performs speculative execution of tests. By constructing a virtual queue of changes based on the order of their approval, it runs jobs in parallel assuming they will all be successful. If any of them fail, then any jobs that were run based on the assumption they succeeded are re-run without the problematic changes included. This means that in the best case, as many changes can be tested and merged in parallel as computing resources will allow for testing. And of course, with cloud computing, that isn't much of a hurdle.
Most changes to OpenStack do pass tests the first time, so planning for the best case is very useful. Other changes we are making, such as executing tests as soon as they are uploaded to Gerrit for review will help to provide early feedback to developers so that reviewers (and Jenkins) don't waste time trying to merge changes that we know ahead of time will fail.
The program that now drives our execution of tests is called Zuul. It is quite generalized and not at all specific to the OpenStack workflow. In fact, it's so configurable, it doesn't even have the idea of gating programmed into it. With only some YAML configuration, it can be made to run all of the kinds of jobs we've developed during the course of OpenStack development:
Check jobs: tests that run immediately on submission of a patch. No speculative execution is done, all tests can simply run in parallel and provide early feedback to developers.
Gate jobs: changes are tested in parallel but in a virtually serialized manner so that each change is tested exactly as it will be merged. Changes with failed tests don't merge.
Post jobs: jobs that run after a change is committed (eg, generating a tarball, or documentation).
Silent jobs: jobs that should not provide feedback (perhaps the jobs are not ready for production use).
Zuul can be found here:
It should be easy to use with any project that uses Gerrit and Jenkins. The internal interfaces should be clean enough that if you don't use Jenkins, you can easily plug in another kind of job system (patches welcome!). With a little more trouble, you could probably factor Gerrit out as well.
Development is done just like the rest of the OpenStack project. Clone the git repo, commit your change, and "git review". Visit us in #openstack-infra on freenode if you want to chat about it.
Posted by James E. Blair on April 25, 2012 at 12:14 PM
OpenStack projects have gated trunks -- that is, every change to an OpenStack project must pass unit and integration tests. Each one requires a number of Jenkins jobs to accomplish this, and some support within the project in the form of configuration files and test interfaces. Until recently, this was managed in an ad-hoc manner, but as we add projects and tests, it won't scale. We currently have 235 Jenkins jobs, and that's way too many to manage manually.
Enter the standardized Project Testing Interface. It lays out all the processes for testing and distribution that a project needs to support to work with the OpenStack Jenkins system. By standardizing this, we can start to manage Jenkins jobs collectively instead of individually. This means that not only is it easier to add new projects, but we can be sure that existing projects benefit from improvements in the system and avoid bit-rot.
The OpenStack Common project helps ensure that the code in each project that handles project setup, dependencies, versions, etc, is kept in sync and standardized. The Project Testing Interface depends on openstack-common for the project-side of its implementation.
Finally, the OpenStack CI team (Andrew Hutchings in particular) has been developing a system to manage our Jenkins configuration within puppet. That's how we plan on managing groups of Jenkins jobs, and it also means that changes to the Jenkins configuration can go through code review, just like any other change to the project. Anyone can submit changes to the running Jenkins configuration without any special administrative privileges.
All of these efforts came together this morning when we bootstrapped the Cinder project, the breakout of the volumes component from Nova. Adding "cinder" to the list of standard python jobs in puppet caused all of our standard packaging and gating jobs to be created in Jenkins. OpenStack Common generated the skeleton code for the project that conforms to the Project Testing Interface. And when that code was submitted for review, it passed the automatically-created gate jobs. From an infrastructure standpoint, Cinder went from an empty repository to a fully integrated OpenStack project in just a few minutes.
I've recently made some big changes to OpenStack's devstack gate scripts. As a developer, here's what you need to know about how they work, and what tools are available to help you diagnose a problem.
All changes to core OpenStack projects are "gated" on a set of tests so that it will not be merged into the main repository unless it passes all of the configured tests. Most projects require unit tests in python2.6 and python2.7, and pep8. Those tests are all run only on the project in question. The devstack gate test, however, is an integration test and ensures that a proposed change still enables several of the projects to work together. Currently, any proposed change to the following projects must pass the devstack gate test:
Obviously we test nova, glance, keystone, horizon and their clients because they all work closely together to form an OpenStack system. Changes to devstack itself are also required to pass this test so that we can be assured that devstack is always able to produce a system capable of testing the next change to nova. The devstack gate scripts themselves are included for the same reason.
A Tour of the Devstack Gate
The devstack test starts with an essentially bare virtual machine, installs devstack on it, and runs some simple tests of the resulting OpenStack installation. In order to ensure that each test run is independent, the virtual machine is discarded at the end of the run, and a new machine is used for the next run. In order to keep the actual test run as short and reliable as possible, the virtual machines are prepared ahead of time and kept in a pool ready for immediate use. The process of preparing the machines ahead of time reduces network traffic and external dependencies during the run.
The mandate of the devstack-gate project is to prepare those virtual machines, ensure that enough of them are always ready to run, bootstrap the test process itself, and clean up when it's done. The devstack gate scripts should be able to be configured to provision machines based on several images (eg, natty, oneiric, precise), and each of those from several providers. Using multiple providers makes the entire system somewhat highly-available since only one provider needs to function in order for us to run tests. Supporting multiple images will help with the transition of testing from oneiric to precise, and will allow us to continue running tests for stable branches on older operating systems.
To accomplish all of that, the devstack-gate repository holds several scripts that are run by Jenkins.
Once per day, for every image type (and provider) configured, the devstack-vm-update-image.sh script checks out the latest copy of devstack, and then runs the devstack-vm-update-image.py script. It boots a new VM from the provider's base image, installs some basic packages (build-essential, python-dev, etc), runs puppet to set up the basic system configuration for the openstack-ci project, and then caches all of the debian and pip packages and test images specified in the devstack repository, and clones the OpenStack project repositories. It then takes a snapshot image of that machine to use when booting the actual test machines. When they boot, they will already be configured and have all, or nearly all, of the network accessible data they need. Then the template machine is deleted. The Jenkins job that does this is devstack-update-vm-image. It is a matrix job that runs for all configured providers, and if any of them fail, it's not a problem since the previously generated image will still be available.
Even though launching a machine from a saved image is usually fast, depending on the provider's load it can sometimes take a while, and it's possible that the resulting machine may end up in an error state, or have some malfunction (such as a misconfigured network). Due to these uncertainties, we provision the test machines ahead of time and keep them in a pool. Every ten minutes, a job runs to spin up new VMs for testing and add them to the pool, using the devstack-vm-launch.py script. Each image type has a parameter specifying how many machine of that type should be kept ready, and each provider has a parameter specifying the maximum number of machines allowed to be running on that provider. Within those bounds, the job attempts to keep the requested number of machines up and ready to go at all times. The Jenkins job that does this is devstack-launch-vms. It is also a matrix job that runs for all configured providers.
When a proposed change is approved by the core reviewers, Jenkins triggers the devstack gate test itself. This job runs the devstack-vm-gate.sh script which checks out code from all of the involved repositories, merges the proposed change, fetches the next available VM from the pool that matches the image type that should be tested (eg, oneiric) using the devstack-vm-fetch.py script, rsyncs the Jenkins workspace (including all the source code repositories) to the VM, installs a devstack configuration file, and invokes devstack. Once devstack is finished, it runs exercise.sh which performs some basic integration testing. After everything is done, the script copies all of the log files back to the Jenkins workspace and archives them along with the console output of the run. If testing was successful, it deletes the node. The Jenkins job that does this is the somewhat awkwardly named gate-integration-tests-devstack-vm.
If testing fails, the machine is not immediately deleted. It's kept around for 24 hours in case it contains information critical to understanding what's wrong. In the future, we hope to be able to install developer SSH keys on VMs from failed test runs, but for the moment the policies of the providers who are donating test resources do not permit that. However, most problems can be diagnosed from the log data that are copied back to Jenkins. There is a script that cleans up old images and VMs that runs once per hour. It's devstack-vm-reap.py and is invoked by the Jenkins job devstack-reap-vms.
How to Debug a Devstack Gate Failure
When Jenkins runs gate tests for a change, it leaves comments on the change in Gerrit with links to the test run. If a change fails the devstack gate test, you can follow it to the test run in Jenkins to find out what went wrong. The first thing you should do is look at the console output (click on the link labeled "[raw]" to the right of "Console Output" on the left side of the screen). You'll want to look at the raw output because Jenkins will truncate the large amount of output that devstack produces. Skip to the end to find out why the test failed (keep in mind that the last few commands it runs deal with copying log files and deleting the test VM -- errors that show up there won't affect the test results). You'll see a summary of the devstack exercise.sh tests near the bottom. Scroll up to look for errors related to failed tests.
You might need some information about the specific run of the test. At the top of the console output, you can see all the git commands used to set up the repositories, and they will output the (short) sha1 and commit subjects of the head of each repository.
It's possible that a failure could be a false negative related to a specific provider, especially if there is a pattern of failures from tests that run on nodes from that provider. In order to find out which provider supplied the node the test ran on, search for "NODE_PROVIDER=" near the top of the console output.
Below that, you'll find the output from devstack as it installs all of the debian and pip packages required for the test, and then configures and runs the services. Most of what it needs should already be cached on the test host, but if the change to be tested includes a dependency change, or there has been such a change since the snapshot image was created, the updated dependency will be downloaded from the Internet, which could cause a false negative if that fails.
Assuming that there are no visible failures in the console log, you may need to examine the log output from the OpenStack services. Back on the Jenkins page for the build, you should see a list of "Build Artifacts" in the center of the screen. All of the OpenStack services are configured to syslog, so you may find helpful log messages by clicking on "syslog.txt". Some error messages are so basic they don't make it to syslog, such as if a service fails to start. Devstack starts all of the services in screen, and you can see the output captured by screen in files named "screen-*.txt". You may find a traceback there that isn't in syslog.
After examining the output from the test, if you believe the result was a false negative, you can retrigger the test by clicking on the "Retrigger" link on the left side of the screen. If a test failure is a result of a race condition in the OpenStack code, please take the opportunity to try to identify it, and file a bug report or fix the problem. If it seems to be related to a specific devstack gate node provider, we'd love it if you could help identify what the variable might be (whether in the devstack-gate scripts, devstack itself, OpenStack, or even the provider's service).
All of the OpenStack developer infrastructure is freely available and managed in source code repositories just like the code of OpenStack itself. If you'd like to contribute, just clone and propose a patch to the relevant repository:
You can file bugs on the openstack-ci project:
And you can chat with us on Freenode in #openstack-dev or #openstack-infra
The next thing planned for the devstack-gate scripts is to start running Tempest, the OpenStack integration test suite, as part of the process. This will provide more thorough testing of the system that devstack sets up, and of course will help Tempest to evolve in step with the rest of the system.