News and Mailing lists almost back

Angelo and Alex managed to get the Alex's news spool to finally import into INN, so we just need to configure the news server and plug in the spools and we should be all set.

I'm working on getting mailman back up; we don't have the lists anymore, so people are going to have to subscribe themselves when it does come back up.

Progress report

Mail delivery is online and the spools have emptied down. Last I checked, there were about 40 messages in the queue (out of a total of ~18k that we started with).

Mail submission is working, you should be able to send mail both local and via authenticated ssl if you're remote. There does seem to be a configuration problem causing fury and maybe others to reject, looks like IPv6 config. We're looking into it.

IMAP and POP are online. SASL was crashing because it was linked against the wrong version of kerberos. Dan W. got that fixed, thankfully.

None of the fancy filtering stuff is online - OpenSPF, SpamAssassin-via-start-settings, Greylisting, Amavis/ClamAV are all out right now. We'll be working on this tonight.

Mailing lists are gone, newsmail/mailnews are gone. We'll be rebuilding those soonish.

News is progressing. Angelo managed to get the old DNEWS spools running on a test machine. We'll start to get INN installed on whitefox and migrate the spools to that. More progress on that sometime between tonight and Sunday.

Alex Grant is fighting with getting his news cache into a test server to be able to import them into the new news server, but we have the raw data so it's just a matter of getting it imported. Worst comes to worse, there are several hard ways to do it.

Delivery onlinish

The delivery half of mail is running, draining out the relay. Whitefox's 60.19 ip is still firewalled, letting the relay finish first. Once that happens, we'll unfirewall it.

Imap and pop should be kinda up. Mail submission is still down right now.

Mail delivery almost ready

We're nearing the final stages of having mail /delivery/ back online, with several caveats.

We will have procmail processing; spamassassin, nmh, razor-agents are installed and look like they're working.

All of the other crazy configuration stuff we had - openspf, greylisting, amavis/clamav and such will not be in. We're going for simple for now.

Actually getting to your mail will come later. Imap and pop are having a hard time (saslauthd and friends are throwing core files). But with mail delivery online, forwards and procmail processing will happen, and mail reading via pine and mutt and such /might/ work.

News recovered

------- Forwarded message -------
From: "Alex Grant"
Subject: News recovered
Date: Thu, 05 Mar 2009 06:20:50 -0500

Oh hai guyz. Just letting everyone know that I have a nearly-complete
backup of news, in INN's tradspool format, up to the day before the
server went down. Since I previously set my news client to cache the
contents of every message ever posted, I wrote a script to convert
from one format to the other and there you go. (It's "nearly-complete"
because it doesn't include groups I wasn't subscribed to; however, the
only groups I wasn't subscribed to are the ones nobody's posted
anything in for years, so we should be able to recover those
completely from the Really Old Backup)

Rebuilding

Rebuilding efforts continue. Got the base OS installed on a FreeBSD geom mirror after much work. Started working on installing packages when i got up around 11 am.

Disk recovery efforts are looking poorer and poorer. It's looking like out of 4 disks, we only have a good image of one, which means we only have half the stripe set. We're still looking into this, but it's looking like our best bet for recovering the news spool is going to be to use the old dnews spool + what people find in their news client caches.

Getting mail back up is #1 priority, but it's going to be a while because of all the crazy crap we had configured. We did find some of the configs on a test machine, so that'll certainly help tremendously, but it's going to be a lot of time spent tweaking and testing. We'll keep posting updates.

News is back-burner right now. What we have is the old spool from when we were using dnews. We're going to need to set up a separate machine, get dnews working on it, and then export news from that. It probably won't be easy. I'm guessing that news will continue to be down for several days after we get mail working. There's still hope in recovering most of the spool, but that will be a slow effort even after we get news working again. BTW, all you newsmail users, if you've got archives of your email, please get in contact with us.

Update 5:57 PM: The FUDForum installation that John Musbach setup seems to have a cache of news posts. We haven't looked into it much yet, but it's likely that it'll have a lot of the more recent messages, if not every message from the server. This is some very needed good news.

CSH Mail Server - Long live news

This just keeps getting worse. Please read this entire message, of if you don't care to, read the bottom: we might need your help.

Turns out the news spool was probably stored on /usr or /var, which are medium-term dead right now.

If i recall correctly, when we migrated to innd, innd wanted the news spool somewhere under /usr/local/. It was so small and we had the space for it, so we let it continue to use that directory.

We haven't lost all of news - we still have what was left in /news_domain, which is what the old dnews server was using, which is current as of Aug 2008.

At worst, we lose all news since Aug 2008.

We have two long-shots to get the news since then
1) Try to reinterleave the disks by hand from disk images
We're getting the images of the old data onto hactar right now. That effort can start as soon as the images are available.

2) This sounds bad, but it's not entirely impossible: recover the news posts from caches from clients.

Thus, my request to anybody reading this: Zip up your client's cache of news and upload it to your user directory on house systems, then make a comment here that you've done so. If you need quota, let us know and we'll give it to you. If you decide to post it somewhere else (please dont!) make sure the rest of the world can't read it!


Update 11:50pm: Rusty has brought up the extremely good idea of sending the drives off to a data recovery outfit. Given that the drives aren't damaged, there's a lot of hope in recovering the data.

CSH Mail Problems Day 3

The restore from ITS is going tragically slow (Angelo told me about around 1:20 pm that we only had about 2.5 GB recovered out of ~23).

Actually, looking at the console just now, the recovery aborted, apparently a network glitch. FML.

We're now going to have ITS restore the data to some disks they have over there, just so that we can get the data off the tapes as fast as possible, once that is done, we can swing by with some removable disks or something to get the data back to CSH.

Watching the restore right now, it looks like it's moving at around 5-15 MB/second, so it should be done in a reasonable amount of time.

Update 4:30pm: Backup server index isn't showing anything in /usr, but reports indicate that /usr was backed up; guessing index is AFU. /mail_domain and such are inbound to hactar. Probably going to need to build base OS from scratch.. not looking forward to this. gotta run!

Update 6:51pm: Good news: mail and news spools are intact. Bad news: /usr and /var are gone. All this time, networker was backing up /usr/compat/linux/[usr,var]. I feel like an ass for never seeing this.

CSH Mail Server Problems

Whitefox got rebooted due to a power problem, then the raid card refused to boot up because its hold-up battery is dead. We've been thus-far unsuccessful at convincing it to not give a damn. Seriously shitty hardware.

Our options:

1) Get the card to not care about the failed battery and boot anyway
Can't get the card to load its bios to even play around with settings/etc.

2) Get a new card
Can't find one.

3) Trick the card into thinking there's a battery there
Tried that, still complains.

4) Reinterleave the blocks by hand on another machine
Question marks.

5) Restore from backups onto a new array definition.
Possible loss of data.

It's looking like we're going to be forced to restore from backups, which I'm not looking forward to because we'll lose a little bit of mail and whatnot since the last backup.

Chris Lockfort has been working on it since it happened last night, but hasn't been able to recover it yet. I'll be going in today after work to help.

Update 6:10 pm:

We're going ahead with bare-metal restore from backups. The raid card is starting to respond, after some magic, so while we acquire backups from tape onto spare disk, we're going to continue to try to magic up the raid card.
Opcomm also has a twitter account now:
http://twitter.com/cshrtp