Had something of a breakthrough today. Late this afternoon I noticed the website go down and called Damien, who in turn quickly spoke to the company that physically run the server. As the situation was ‘live’ they logged in to try to figure out what was happening. They immediately spotted 79 simultaneous connections to various pages on my site, all from porn IPs and zombie machines, that were completely overloading the system. Some very fast work with iptables calmed the situation somewhat, and we’re now blocking over 500 IPs.
Current theory: spam spikes have been causing the server to stop responding. They’re all trying to leave comments or trackbacks, and although Akismet / Bad Behaviour do a sterling job of blocking them, they still need to be processed. We had one at 1400, which would correspond with 0900 on the east coast of the US, and thousands of zombie machines getting turned on (possibly confirmation bias, but not an unreasonable possibility). All the recent tweaking of the server shored it up so it lasted longer, but there’s little the average server can do against what amounts to distributed denial-of-service attacks.
I don’t know whether this was the primary cause of the crashes or a contributing factor, but either way it’s a very helpful discovery. I’ve set up a duplicate site on another server to determine how well the site performs independent of evil spammers, but I’m hopeful this site should be much more stable now.