Testing NetBSD automagically

Speaker: Martin Husemann

1. Building blocks

1.1. ATF —the automated testing framework

ATF comes from the GSoC 2007. It has a successor already announced, kyua, which is almost ready to take over.

ATF has bindings in shell, C, and others languges. It is easily scriptable, allows fully automated runs as easy as make regress and produces nice reports (in ASCII, HTML, XML amongst others). It does full runs, without stopping at the first error. Tests are organised in a hierarchy, and it is possible to run only a part (e.g. only the kernel).

It currently runs several times a day (thus requiring a buildable CVS; more on this later).

Test programs can run outside of ATF, they can be pushed to users for them to run.

1.2. Rump —the runnable userspace meta program

Rump enables one to run kernel code in user space. It consists of a set of shlibs, each representing a slice of the full kernel. Client code (mostly userland) is compiled like a new architecture. Rump operates on file system images (regular file for the host system). Rump make it easier to debug (userland VS kernel).

There is a librumphijack, LD_PRELOAD'able, for instance to give a new TCP stack to unmodified binaries.

1.3. QEMU

QEMU is a different architecture, from the point of view of build.sh. It allows easy scripting, which is very important for the testing, plus access to the serial console of the guest.

It has some rough edges: some failures come from the emulated FPU notably.

1.4. Anita

A python script that uses a release or a freshly downloaded snapshot, creates a brand new QEMU environment and run the full test suite. In itself, the installation of the new release is already a good test.

Anita takes about 1h30 to run.

2. Integration

The tests are provided in the release. There are 2800 test programs in the tree. Today, there are as many code change commits as test commits: good thing enabled by the framework.

3. Demonstration

3.1. Shell scripting

The test environment is just like good ol' /bin/sh, just with a set of shell functions and variables already defined.

A test describes its requirements (for instance, I need to have access to cc(1)). Then, it describes the command to be ran, and the exit code and the output expected. The tests will be run in a clean, empty directory.

3.2. C bindings

The C interface to ATF is a series of pre-processor macros, describing meta-information about the program and allowing ATF to check the behaviour of the library under test.

3.3. Curses applications

This is more difficult to test. A new $TERM was defined, in which control sequences are displayed verbatim. Two programs are involved, a director (programmed in some DSL), and a slave; the two communicate over a socket. The time between keystrokes and between screen updated is handled. The diff with the golden output was designed to be easy to read.

3.4. Rump example

Martin showed a test case in which a pair of TTYs are opened in the wrong order; this made the running kernel crash. This bug report was not easy to debug in kernel code; it was much easier to reproduce and debug in userlang code.

3.5. Simulated network (Rump second example)

Martin demonstrated us a small setup with three Rump instances, one router in which tcpdump(8) was running and two endpoints.

In this setup, the NICs are implemented as shmem interfaces.

4. Report

The report is the output of the ATF. It includes expected failure (bugs already known, with a ticket) and skipped test (took too long to run).

5. Bugs found

The testing framework in NetBSD allowed to catch early a lot of bugs. Bugs can be sorted in those categories;

build breaks;
bad tests;
emulator bugs (in particular some bugs in the FPU of QEMU);
odd bugs in the toolchain. Martin cited a bug in the shell, that was triggered by the import of a new gcc(1). Obviously this caused a lot of failures in the test run;
random bugs all around;
real regressions.

6. Conclusion

This testing framework was possible thanks to the right mindset of developers. Most of the tests are written by developers. This is made possible thanks to the tools: testing is easy with the right framework. The number of tests in the tree is growing fast. However, Martin insisted, writing good tests is not a trivial task.

Automated testing depends a lot on the tree being buildable at all times. This added pressure on the developers to keep the code buildable. This was sometimes a nuisance.¹

Martin had only one regret: no doing this earlier.

Footnotes:

I shall note that there was another talk, which I didn't attend, whose topic was modernising NetBSD, in particular going from CVS to Fossil.