sendmail — history and design
Speaker: Eric Allman
1. History
1.1. How did it all start?
sendmail(8)
started as something not official, without any financial
support. It was born in the 1980's, out of necessity. Many staff were
working on Ernie CoVAX. Eric was working on INGRES; relational
databases were hot new things at the time. At one point, the INGRES
machine (a valient 16-bit PDP) on which Eric was working got an
ARPAnet connection1. Many staff wanted access to the INGRES
machine in order to use its ARPAnet connection. However, the machine
had only two TTY lines, and adding more would have been too expensive.
Eric soon saw that what most people wanted out of the ARPAnet
connection was to be able to send mail to other machines. His solution
was to develop delivermail
, a system for forwarding messages between
Ernie and the other systems accessible over the ARPAnet.
At this point, Eric showed us a view of the architecture of
delivermail
: it mostly connected together a lot of systems, from
FTP-delivered mail to UUCP2. Each system was having its own pool,
its own way to address mailboxes… Eric showed a table with a
selection of addresses. Each network (BerkNet, ARPAnet,
UUCP-accessible machines) used a different way to designate machines
and mailboxes, and a moderately complex address could be interpreted
in several different ways. foo!bar!baz
could mean mailbox baz
on
machine bar
after a hop through foo
(over UUCP), or mailbox
bar!baz
on machine foo
if foo
accepted colons in user-names.
This accounted for significant complexity in the initial
delivermail
.
1.2. First advices
In order to build successful software, the first thing (according to
Eric) one needs to accept is that one programmer is finite. Don't
redesign the UA (even when the UA was /usr/bin/mail
), not (only)
because users were already used to it, but also because it would be
too much work. In the same vein, he decided not to redesign the system
mail store.
The second thing was to make delivermail
adapt to the world, not the
other way.
These two things (implementing as little stuff as possible, and
adapting to the rest of the world) guided the initial design of
delivermail
, as a rather small message passing device between other
mail systems.
1.3. Problems with the solution
The configration was compiled-in; that is, in order to install
delivermail
on a new site, one needed to hack the source, recompile
and install.
There was no address translation between networks. Address parsing was both simplistic and opaque. Users needed to read a man page (man pages?) before being able to build the mail address of someone else living accross one or two networks.
But hey, it was supposed to be a quick hack for freeing the INGRES machine, and it kinda worked!
1.4. The transition to sendmail
The DARPA gave a grant for completing 4.2BSD (the first with TCP/IP!).
Bill Joy asked Eric to add SMTP to delivermail
. Supporting SMTP
required adding a mail queue, which had quite an impact to the
internal architecture. Eric ended up rewriting his dæmon, creating
sendmail
.
1.5. The chaos years: 1981–90
Eric left Berkeley to pursuit a lucrative (yeah… :-p) industry in 1981.
Around the same time, Bill Joy left Berkeley to join a new and
promising company, called Sun. The rest is part of a larger history
(told in part by Kirk McKusick), in particular the Unix wars, during
which every vendor would extend their system (including sendmail
) in
different, sometimes incompatible directions.
1.6. Return to sendmail
Eric got back to Berkeley in the early 1990's. He started with adding
subdomains handling, and one thing leading to another, scope creep
resulted in a complete rewrite, called sendmail8
3.
1.7. Sendmail, Inc.
Sendmail, Inc. was the first commercial / OSS hybrid company (and still employs Eric). At the time, it was not obvious how to build a business plan on such a situation; but evidently Eric did not manage too badly.
New features were introduced:
- encryption;
- milters, a feature Eric seems quite proud of;
- virtual hosting;
- LDAP support;
- lots more checking of data that comes from the outside world…
Those features came from commercial needs. Before, what drove Eric to build new features was more the "nice to have" attitude.
1.8. Changes in requirements
Reliability has always been important. An important point of focus was to always get the mail through or send back an error to the user.
With a more commercial incentive, Eric started adding functionality and performance; then protection (Fred: I just noted that in my notes, don't remember more precisely what Eric meant), then legal compliance (think audit and tracability, log retention), then cost control.
2. Design decisions
Eric started with a few remarks:
- it is easier to build a tool than a solution;
- the world at the time was ugly;
- the world today is still ugly.
He would do things mostly the same way, modulo some updates.
2.1. Rewriting rules
In hindside, it wasn't overkill. The concept was sound: regexps replaced with tokens, but the syntax and the control flow could have been better.
One stupid thing was making tabs into active characters. To Eric's
defence, if make(1)
made tabs active, it must have been a good idea!
Not.
2.2. Message munging
This was essential for interoperability at the time, not necessary
today. In retrospect, sendmail
should have had a passthrough mode,
where trash would be either accepted or rejectes, without trying to
fix others' mistakes.
2.3. Syntax of configuration files
It is ugly, flat (no nesting), with too much signal characters.
Today, Eric would have used something like the Apache configuration.
2.4. SMTP and queuing into sendmail
Eric was reluctant to include it, but it was The Right Thing. He would have added more privilege separation.
The queue had two files per message (for the header and for text). Having data and protocols as ASCII helped the debugging.
This was the right approach for the time, today Eric would put envelopes in a DB (less trashing around).
2.5. m4(1)
for configuration
The dnl
macro was bad. It was added to produce a neat output, at the
cost of a significant uglification of the input files.
Some tool was needed for the configuration, but m4
was probably not
the right one.
2.6. Extending vs changing
This is a big, important concern. In hindside, Eric thinks he paid too
much attention to backward compatibility4. He wanted really hard
to be able to install a new release over an old one, no touch the
configuration, and have sendmail
to keep on working. This was a
noble goal, but prevented a number of changes that would have broken
configurations on upgrades.
3. Things Eric would do differently
Fix problems earlier, already mentionned above.
Use modern tools, the build system in particular which was hand-rolled.
Privilege separation.
A string abstraction.
Separate mailbox names from unix IDs.
A cleaner configuration file.
4. Things Eric would do the same
Use C. Eric described C++ as the most ugly thing in the Universe, having both the limitations of C and the problems of OO languages, without any of their advantages.
Bite things in small chunks (see "the programmer is finite" above).
Use syslog(8)
, which was very new at the time, being written. Eric
grew quickly tired of having random processes write their logs to
random files in random places.
The rewriting rules, except for the active TABs.
Don't rely too heavily on outside tools.
5. Lessons learnt
KISS.
Know what you're doing; this is way more important than having an advanced design.
Flexibility trumps performance.
Fix things early. Stuff is easier to fix early, and is less painful when you have 5 clients than when you have 50. Of course, this means, sometime, breaking backward compatibility.
ASCII is great for internal files and protocols.
Documentation is key. According to Eric, the bat book was very
important in the success of sendmail
with sysadmins.
Footnotes:
The ARPAnet backbone was built on 56kbps links!
With a half-smile, Eric asked who even remember UUCP. A guy seated just before me said he was still using it. Way to show that Unix maintains backward compatibility —or does it show that we are mostly gray beards that refuse obsolescence? :-)
I seem to remeber that numbering did not start at 1, but at
something like sendmail6
—but I don't have notes backing that up.
I feel I have to point to The UNIX Haters Handbook, page 185. Incidentally, the all-to-important backward compatibility issue in this case was also about having TABs as active spaces… I find the Haters Handbook a very interesting read, even today. The game is to find out which concerns have since been fixed, and which ones are still true today.