Testing NetBSD automagically
Speaker: Martin Husemann
1. Building blocks
1.1. ATF —the automated testing framework
ATF comes from the GSoC 2007. It has a successor already announced, kyua, which is almost ready to take over.
ATF has bindings in shell, C, and others languges. It is easily
scriptable, allows fully automated runs as easy as make regress
and
produces nice reports (in ASCII, HTML, XML amongst others). It does
full runs, without stopping at the first error. Tests are organised in
a hierarchy, and it is possible to run only a part (e.g. only the
kernel).
It currently runs several times a day (thus requiring a buildable CVS; more on this later).
Test programs can run outside of ATF, they can be pushed to users for them to run.
1.2. Rump —the runnable userspace meta program
Rump enables one to run kernel code in user space. It consists of a set of shlibs, each representing a slice of the full kernel. Client code (mostly userland) is compiled like a new architecture. Rump operates on file system images (regular file for the host system). Rump make it easier to debug (userland VS kernel).
There is a librumphijack
, LD_PRELOAD
'able, for instance to give a
new TCP stack to unmodified binaries.
1.3. QEMU
QEMU is a different architecture, from the point of view of
build.sh
. It allows easy scripting, which is very important for the
testing, plus access to the serial console of the guest.
It has some rough edges: some failures come from the emulated FPU notably.
1.4. Anita
A python script that uses a release or a freshly downloaded snapshot, creates a brand new QEMU environment and run the full test suite. In itself, the installation of the new release is already a good test.
Anita takes about 1h30 to run.
2. Integration
The tests are provided in the release. There are 2800 test programs in the tree. Today, there are as many code change commits as test commits: good thing enabled by the framework.
3. Demonstration
3.1. Shell scripting
The test environment is just like good ol' /bin/sh
, just with a set
of shell functions and variables already defined.
A test describes its requirements (for instance, I need to have access
to cc(1)
). Then, it describes the command to be ran, and the exit
code and the output expected. The tests will be run in a clean, empty
directory.
3.2. C bindings
The C interface to ATF is a series of pre-processor macros, describing meta-information about the program and allowing ATF to check the behaviour of the library under test.
3.3. Curses applications
This is more difficult to test. A new $TERM
was defined, in which
control sequences are displayed verbatim. Two programs are involved, a
director (programmed in some DSL), and a slave; the two communicate
over a socket. The time between keystrokes and between screen updated
is handled. The diff with the golden output was designed to be easy to
read.
3.4. Rump example
Martin showed a test case in which a pair of TTYs are opened in the wrong order; this made the running kernel crash. This bug report was not easy to debug in kernel code; it was much easier to reproduce and debug in userlang code.
3.5. Simulated network (Rump second example)
Martin demonstrated us a small setup with three Rump instances, one
router in which tcpdump(8)
was running and two endpoints.
In this setup, the NICs are implemented as shmem
interfaces.
4. Report
The report is the output of the ATF. It includes expected failure (bugs already known, with a ticket) and skipped test (took too long to run).
5. Bugs found
The testing framework in NetBSD allowed to catch early a lot of bugs. Bugs can be sorted in those categories;
- build breaks;
- bad tests;
- emulator bugs (in particular some bugs in the FPU of QEMU);
- odd bugs in the toolchain. Martin cited a bug in the shell, that was
triggered by the import of a new
gcc(1)
. Obviously this caused a lot of failures in the test run; - random bugs all around;
- real regressions.
6. Conclusion
This testing framework was possible thanks to the right mindset of developers. Most of the tests are written by developers. This is made possible thanks to the tools: testing is easy with the right framework. The number of tests in the tree is growing fast. However, Martin insisted, writing good tests is not a trivial task.
Automated testing depends a lot on the tree being buildable at all times. This added pressure on the developers to keep the code buildable. This was sometimes a nuisance.1
Martin had only one regret: no doing this earlier.
Footnotes:
I shall note that there was another talk, which I didn't attend, whose topic was modernising NetBSD, in particular going from CVS to Fossil.