Well, the thank-you's have been rather ebullient all day long today and I feel
somewhat embarrassed by the attention. Especially given how long it took us
to get the archive on line and visible! It has to be close to 10 years now.
Sigh.
The story is more a story of fits and starts than of resolve. And our
contribution accounts for some (most?) of the first 10 years of the Google
archive.
If I recall correctly, the issue of Henry Spencer's (actually, the University
of Toronto, Department of Zoology's) NetNews archive was raised at a Usenix
conference in the early 90's. The question: can we get at them? Bruce Jones
was especially interested in this. Henry's answer was that it really wasn't
going to be easy because he had neither the disk space nor the tape drive to
pull them all down to make them available.
I, it turned out, did. So one bright winter day I drove from London (Ontario
Canada) to Toronto (Ontario Canada) -- a two hour drive in my shiny new pickup
truck and picked up 141 magtapes from the Zoology department at UofT and
brought them back to the Department of Computer Science at the University of
Western Ontario. (A not unimpressive bandwidth, by the way, of some 18Mb/sec :-)
never underestimate the bandwidth of a pickup truck on the highway!)
Then with the help of several people (some of whom have not yet been credited)
we started to pull the data off of the tapes and onto disks in both the Computer
Science department and the Robarts Research Institute. Lance Bailey, then
with the Robarts Research Institute, did the pulling there and I with assistance
from Bob Webber did it at Computer Science. Bruce Jones from UCSD took some
vacation time and came up here to help pull data down for a week or so as well.
But we quickly ran out of space and time: Lance left Robarts for UBC, Bruce's
vacation ended, and Bob and I got busy doing other things (like our jobs). As
a result, the archive project made very little progress over the next few
years.
Then Brewster Kahle started pushing on us (thanks Brewster!) to get it done.
He even bought us a large disk to hold the archive when we truly ran out of
space. With the help of Sue Thielen, who was out of work and bored, we got all
of the rest of the tapes read down onto that disk. Unfortunately, that disk
was not "close enough" to either a tape drive or the ftp server to make the
data available to anyone. And it wasn't organized in anyway usefully.
Brewster pushed very gently for a very long time but the new archive project
was far from the top of the list of projects I was supposed to be working on
and I just never got it going again.
Late this summer Michael Schmitt from Google started pushing as well. And as
luck would have it, I was able to hire a student to do the final sorting of
the archive as well. And, that luck still holding, I managed to "steal" enough
space on the ftp server for the entire archive! But it still took months to get
that figured out and the archive transferred to a machine from which they pull
the archive. It was the middle of October before we were able make the
collection available to Google. And it is actually available, although totally
unsorted, to anyone who wants it and can deal with pulling some 160 files
ranging in size from 1.4Mb to 65Mb. Just drop me a line to say please and we'll
arrange to make it visible to you.
I'd still like to impose a bit more order on the raw archives than we have but
the time just hasn't allowed for that...