seanm.ca:70/0/nerd/500mileemail.txt
I almost regret posting the story to a wide audience, because it makes a great tale over drinks at a conference. I was working in a job running the campus email system some years ago when I got a call from the chairman of the statistics department. Email really doesn't work that way, generally," I said, trying to keep panic out of my voice. One doesn't display panic when speaking to a department chairman, even of a relatively impoverished department like statistics. Just not more than--" "--500 miles, yes," I finished for him, "I got that. I tried to remember if someone owed me a practical joke. I logged into their department's server, and sent a few test mails. This was in the Research Triangle of North Carolina, and a test mail to my own account was delivered without a hitch. Ditto for one sent to Richmond, and Atlanta, and Washington. But then I tried to send an email to Memphis (600 miles). I got out my address book and started trying to narrow this down. New York (420 miles) worked, but Providence (580 miles) failed. I tried emailing a friend who lived in North Carolina, but whose ISP was in Seattle. If the problem had had to do with the geography of the human recipient and not his mail server, I think I would have broken down in tears. And I was fairly certain I hadn't enabled the "FAIL_MAIL_OVER_500_MILES" option. The server happily responded with a SunOS sendmail banner. At the time, Sun was still shipping Sendmail 5 with its operating system, even though Sendmail 8 was fairly mature. Being a good system administrator, I had standardized on Sendmail 8. The pieces fell into place, all at once, and I again choked on the dregs of my now-cold latte. When the consultant had "patched the server," he had apparently upgraded the version of SunOS, and in so doing *downgraded* Sendmail. But the new long configuration options -- those it saw as junk, and skipped. One of the settings that was set to zero was the timeout to connect to the remote SMTP server. Some experimentation established that on this particular machine with its typical load, a zero timeout would abort a connect call in slightly over three milliseconds. An odd feature of our campus network at the time was that it was 100% switched. An outgoing packet wouldn't incur a router delay until hitting the POP and reaching a router on the far side. So time to connect to a lightly-loaded remote host on a nearby network would actually largely be governed by the speed of light distance to the destination rather than by incidental router delays.
|