Last month I had a three-day battle with a broken RAID setup. I eventually brought it back to life, returned it to the client and, after a few days, sent an invoice. Then, a couple of days ago, it broke again. They called me in.
It’s a tough one. It could be the power supply throwing the occasional wobbly. It could Random Motherboard Kak with the RAID chip. It could be user error. It could be something totally unrelated. Most of which are difficult to diagnose, and would involve trial and error see-if-it-lasts-this-time. They’ve decided to get a new computer instead. They also said they’d pay for my previous work, but made it clear they aren’t happy about doing so, as it “wasn’t repaired”. I think it unlikely they’d use me again.
Ugh. I hate these situations. I can see their point. But I did, in fact, get it working, and there was nothing to suggest the problem would recur. I didn’t charge anything extortionate, either. I’ll talk about it with a more knowledgeable friend to see whether there’s anything technical I should done differently, but that’s kinda irrelevant - there are certainly situations where this could happen through no fault of my own.
I suppose from my perspective they’re paying me to attempt a repair, but to them they’re paying for a repair. I tend to assume people are aware of the former, and although I try to explain what I’m doing, maybe I need to be much more explicit about it. 95% of the time they amount to the same thing, and of the remainder it’s usually something I can tell them about pretty quickly, and charge a nominal fee. But in this kind of situation, the difference becomes important. Maybe I need to get a properly-written contract, so I’m covered. I’m probably leaving myself wide open without one, to be honest.
In the end I caved and offered to fit a couple of components into a separate machine for no extra charge, which mollified them somewhat. But I still feel like I’ve messed up, one way or another.
I remember getting my first hard drive that was bigger than 1GB, and thinking this was amazing. Today it takes half an hour to take 1GB of photos, so I obviously need much more space. Before this morning, my setup had one “500GB” and two “250GB” drives, but this actually added up to 927GB. This is because the manufacturers’ definition of a gigabyte differs from a computer’s definition of a gigabyte. Today I added a second “500GB” backup drive (I saw far too much data loss this week, and it scared me) and passed 1TB1 for the first time. Meaningless, but a little milestone nonetheless.
I’m trying to work out if I’ll ever hit the next milestone: 1024 terabytes = a petabyte. Let’s say I become a professional sports photographer, or something, and take 8GB of photos per day. Even with that, it’d still take 342 years to hit a petabyte. I’d need 55GB of photos per day to hit a petabyte within 50 years. For my camera that would be 6875 photos/day, while the most expensive Canon SLR, at 22MB/photo, would need 2500 shots. Nah, I can’t see individual photographers needing that much space for a long, long time.
I’m trying to rescue a dying hard drive today. It’s suffering from the Click of Death, which means it’s going down no matter what, but it’d be really, really nice if I could get at its data.
I regularly deal with laptop hard drives. 95% of the time they’re slowly dying, and once a Windows system file conks out, I get called. This almost always turns out fine: I quickly copy the still-intact data, slap it all onto a new hard drive, and run a repair install / restore disk. But just occasionally the drives go downhill fast. In today’s case Windows broke at the weekend, and by the time I got there on Tuesday the drive was clicking. Clicking is not good - it means the drive is physically failing to read the data. If it won’t spin up, I can’t do anything.
There’s a solution, but it’s not cheap: you can send the disk to a data recovery centre. They’ll open the drive in their cleanroom and (I assume) transfer the data platters to something which reads them directly. Assuming the platters aren’t physically damaged, this will probably work well. But it’s very expensive - quotes this morning suggested ~£300 for a 40gb drive - and I don’t know anybody who’s actually done it. Because, with laptops, the lost data are usually sentimental rather than critical. It’s not worth that expense, but people are still sad to lose it. This sucks.
I hate it when I can’t recover data. Obviously, everyone should have backups etc., but saying so is all well and good - in practice, most people don’t1. And it’s still heartbreaking to lose, say, years of photographs. But there is one last, desperate trick you can try before paying a fortune / giving up. Put the drive in the freezer.
Honest. It contracts the metal, and has been known to bring drives back from the dead. Until they warm back up…but I only need 15mins for a drive image. I’m trying this today.
The drive in question refused to stop clicking, so I shoved it in the freezer for an hour. I then quickly slapped it into an external usb caddy, hit the power and…I’m pretty sure it span up. Laptop drives are very quiet, but if I tilted it there was a definite force, so something was happening. Windows said “I’ve found a drive!”. And then sat there. And sat there. I reset the enclosure to try and kick things back into life, and this set it clicking again. Damn.
As I said, this is a last-ditch strategy. I’m really hoping that a bit longer in the freezer will do the trick - some say they’ve had drives fail after 4h but work after 24h. I’ll give it another few hours and try again. If that doesn’t work, I’ll try it overnight. I’d really like to get this one.
Update after another 2hrs: still nothing. It spins up, then starts clicking. Can’t think there’s much hope, to be honest, except there was that all-too-brief ‘disk drive found’ message from Windows…
Update 2: Sadly, this didn’t work. After an overnight freeze it refused to do anything for a minute, then just clicked as ever. I guess this type of click wasn’t the freezer-solvable one. Damn.
I totally messed up this weekend. I spent the whole time on my own, trying to fix that RAID array, and pissed off at least two groups of people I was supposed to meet up with. The computer’s finally all working as of 0130, but I can’t possibly charge for all the time I spent. I hate being beaten by problems, is the thing, and I have a bad habit of taking it personally when I can’t figure things out. But this was just silly, and I crafted a situation with no upside. Damn it.
I’ve been grappling with a broken RAID setup this weekend. I was given the computer with little more than “it’s broken”, and it’s taken a while to diagnose.
It wasn’t booting. It got so far as ‘listing pci devices’ and conked out. Usually you’ll see an error in such situations, but this one, helpfully, just hung. This was when I discovered the RAID0 setup. As far as I can tell, it came from the store with this configuration, which is stupid. RAID0 sucks. It lets you link multiple drives into one big space, and I think there are speed benefits, but this is all outweighed by the data being dependent on all the drives staying healthy. If any drives fail, you lose everything. Not good.
But the drives were fine: both passed a sector scan without issue. The RAM checked out too. For a while I thought it might be a boot sector thing, then eventually I slipstreamed an xp disc with the required RAID drivers, and the initial install process reported no partitions. Ok - maybe they got deleted somehow. But how best to investigate? Usually this is easy - just whack the drive into another computer, and run whatever data recovery is appropriate. But RAID is finicky, and I was wary. One wrong move and you’ve broken the array and made data recovery infinitely more difficult. I really wanted to leave the drives alone as much as possible.
Eventually I shoved in another drive, installed XP onto it (which wasn’t without evil BSOD complications), hooked up the RAID and ran Active@ Partition Recovery. This took an hour to find two deleted partitions, one of which contained all the user data - perfect! I hit the ‘Recover’ button and Active@ said ‘Please pay for the full version’. Now, I’m sure there’s freeware that can undelete partitions. I’m sure I could even do it manually, if I did the research. But the hell with it - the ‘recover’ button was right there, so I paid the £27 for the full version. This fixed the mbr and boot sectors, and mounted the drive in Windows.
Windows said ‘wtf something is b0rked here’. The partition was back, and Active@ could list its files, but Windows couldn’t quite figure it out. This is the kind of thing which at which Scandisk excels. It usually works very well. But occasionally it’ll break things beyond belief, and a backup is advisable. So I switched to my favourite data recovery program: Restorer 2000 Pro. This little utility has saved me many, many times over the years. It scanned the major partition, and has spent the last six hours transferring all the data to yet another drive.
I’m currently waiting for scandisk to complete. I think it’s adding index entries to every file on the disk. Either that or it’s stuck in an infinite loop. Time will tell.
Charging for this kind of work is always difficult. Half the time is spent waiting for scans to complete or data to transfer - I’ve got through half of The Diamond Age this weekend - but it’s not like you just leave it running, either: there’s always some query that means you have to check it every five minutes (Restorer 2000, for example, has a strop if you try to recover too many directories from the root at once, so you have to be on hand to manually start the process every quarter of an hour). Charging a full hourly rate would obviously be hideously expensive and morally wrong, but you obviously don’t want to feel like you’re wasting your time. You also can’t always predict how long something will take, so you can’t say to the client “I’ll do £x amount of work then give you a call”. It just doesn’t work that way - oftentimes stopping halfway through would mean leaving the computer in an even worse state. I tend to add it up and see what feels reasonable. I’m not going to charge more than the computer’s worth, even if the job has taken that long. I know people who tell me I’m wrong, but most of my work is for individuals with their home computers, and I don’t think it’s fair to charge silly money.
Ho hum. Scandisk is still indexing, and the drive’s chugging. Man, I really hope it’s doing something useful.
Yesterday’s post brought me two toys:
The ‘brella is mine. The laptop I’m setting up for a friend. But this is no ordinary laptop, this is an eee pc. Alice of the wonderful Wonderland got one a while back, and her initial possible-typo thought has been ringing around my head for 48hrs, because it sums the thing up perfectly: IT TITCHY! Here’s a better picture, actual size1:
See? It titchy! I’m in love. It’s 23 x 17cm and in its case weighs 976g, which isn’t much more than a large book, or my camera. It has wifi, 512mb RAM, three USB slots, a 3hr battery, a VGA port, an SD-card slot, two speakers and a webcam. It runs linux, boots in 15 seconds, shuts down in 5 and comes with OpenOffice.org, Firefox and Skype. Best of all, it only cost a shade over £220 - brand new.
Clearly there’s a compromise somewhere, and it’s mainly in power and disk space. It’s not at all fast - 630Mhz - and the hard drive is only 4gb2. Plus, the screen resolution is only 800×480, being as how it’s only 7″ on the diagonal. But if all you want to do is surf, type and chat, you don’t need any more than that. Couple this thing with an apparently-compatible Huawei PAYG Mobile Broadband stick and you’ve got 1mbps internet access you can throw into your bag just in case. Is brilliant.
The keyboard is obviously tiny tiny tiny, and takes some getting used to. But it’s at least a standard layout, and I adapted pretty quickly. The mouse ‘buttons’, it has to be said, are godawful, but thankfully the trackpad supports tapping. The machine recognised my USB drive straight away and I was able to transfer files from my XP machine without issue3. The screen is just large enough that text is readable without straining, but it’s close.
The menu system is fairly unexceptional, and buries the good stuff in with a load of less-than-useful programs, but does the job. It’s not officially editable, but activate the ‘advanced mode’ and you’ve got the full configurability4 of linux. About which I know nothing, but I had a crack anyway. The machine is popular enough that the eee wiki has many, many guides on unlocking advanced features without screwing everything up, and I went through a few step-by-step. The instructions suffer from the usual crowd-sourced documentation problems in that they can veer from incredibly useful to ‘oh, and before you do the next step you’ll need to rebuild the kernel - once you’ve done that…’, but are on the whole good. It has a problem out-of-the-box that prevents it from connecting to wireless networks that have WPA keys containing spaces; I was able to fix this by overwriting a couple of system files. I also tidied up the default layout, upgraded to OpenOffice.org 2.3, and enabled the option to boot into KDE. You can do much more - and for £40 you can upgrade it to a touchscreen(!) - but as it’s not mine I stopped there.
I’d be saving up for one, but it’s no use at all for anything photographic. Sure I could probably shove the GIMP on there, or even try XP and Photoshop if I thought I could handle the speed, but the 4gb drive is just too small. My camera’s memory card is twice that, so it’d be no use for backing up ‘in the field’, and I can’t imagine that editing a 3888×2592 file on that screen would be much fun. The eee has inspired a whole host of other micro-laptops, but they all seem to be coming in far more expensive, sadly.
My friend spends 4hrs on a wifi-enabled bus every day, but gets fed up of lugging a full-size laptop around. This should be a perfect solution, and I have to hand it over tomorrow. I am sad. I’ve named it and everything. Still, at least I have a ‘brella.
I’ve been exhausted today, after a heavy weekend. A friend invited me to help install and configure a startup’s network, and both nights neither of us got to sleep until 0300.
The company had quite the setup: 24″ monitors, VoIP phones, a beautifully-sunlit open-plan office, Aeron chairs, the lot. Their building had network wiring already, and it was our job to get everything connected and talking to each other (or not, if you’re a VoIP phone and a PC). I’ve never configured anything quite so high-end before. We had Sawyer the 24-port gigabit ethernet switch (brawn, didn’t need to do anything fancy), Jack the 24-port fast-ethernet switch (less powerful, but needed to do clever routings) and Hurley the wireless router (wireless = the cool bit) all connecting to Kate the ultra-configurable mega-secure Cisco router (ultimately in charge, and physically under both the switches). Everyone needed internet access, and it all had to work via DHCP - all settings being supplied automatically once connected to the wall / wireless. Each component threw up problems at times, and it was quite the challenge.
As ever, the toughest problems were sometimes the fastest - denying intra-subnet communication took five minutes, despite being a major worry - while the insignificant things ate up time - the network printer Just Didn’t Respond, and took two hours to fix. At times we delved into Cisco’s formidable command-line-interface, and discovered various deficiencies in their generally ultra-swish GUI. We also ate a lot of muffins. And bon-bons.
By 0130 on Monday morning everything was wired up and talking to each other. It was quite the relief! Today we heard nothing until this evening, when a call said everything had run fine. This is pretty rare - there’s always something broken - and we’re concerned they’re using next door’s wireless.
There was a hell of a learning curve and the pressure got to us both at times, but it was great fun nevertheless. I’ve also grown quite fond of Cisco routers. You might need a degree in jargon to configure the things, but they’re seriously powerful toys.
In the corner of my parents’ office is a small computer acting as a file and email server. It’s a little workhorse that’s been going for ages, but late last year it started turning itself off. Mum and Dad would arrive in the morning and find it inexplicably lightless, despite no power cuts.
I tracked it down to the weekly full HD-clone backup, and could at least reproduce the problem: the machine just conked out, with no blue-screen or anything in the windows event log. This suggested it was hardware-related, and there were a few possibilities, the most likely being one of the hard drives. I tried running full sector scans, but it conked out halfway through. I took them away, but found no errors. The next most likely cause (I thought) was the power supply. So I replaced that and the next backup promptly reported sector errors on one of the hard drives. So I replaced that too, and the backup completed - yay!
A week later: same problem.
So I swapped out the RAM. No difference. At this point I was starting to think it must be Random Motherboard Crap. Sometimes you get a problem you can’t trace, so you replacing the motherboard+cpu, and everything’s ok but for the lingering feeling that maybe you missed something. In this case, I would indeed have missed something if I’d replaced the lot.
Last week I sent an email around to various friends, asking if they had any other thoughts. My friend Ben got back to me the same evening with a list of things to try, one of which was a stress test - did it fall over under high cpu load? I tried Prime95 last night and the machine fell over in minutes - far less time than the backup typically took to take down the system. It was pretty late by this point, so I stopped for the night and while driving home suddenly realised the incredibly obvious possiblity.
This evening I ran the old-school Motherboard Monitor, and watched as the stress test took the CPU temperature up to 105 degrees celcius. 105 degrees! That is a ridiculously high temperature for a non-ancient CPU. So I took a look inside, really really hoping I hadn’t somehow missed the CPU fan having fallen off, or something. No - the fan was in place and still going, but there was a bit of dust clogging the heatsink. I vacuumed it up and re-ran the stress test.
It peaked at 66. A not-crazy amount of dust had increased the temperature by 40 degrees! It survived the stress test without issue, and is currently running a full backup for the first time in months1.
I’m a little annoyed I didn’t think of this. Overheating used to be the go-to problem for random shutdowns, but modern computers run so cool that it’s now pretty uncommon. But it shouldn’t have taken me four months to figure it out. Oh well, at least they’ve got a shiny new power supply.
How come the backup completed that once? Could be chance, but I bet I left the side of the case off, having just installed the new drive. Still didn’t twig, though.
I think Ben was thinking ‘overheating’, but didn’t want to say it so bluntly so I wouldn’t feel bad. He’s subtle that way. Thanks, Ben!
I temporarily plugged a monitor into a Vista machine, and ever since the monitor has refused to display anything but 640 x 480, reporting ‘unsupported mode’ for all other resolutions. Irritating.
My desktop speakers are rubbish, and one will regularly cut out, requiring picometre adjustments of the phono connector. Except for this week, when setting off a Windows-error ‘bing’ will fix the problem. I am aware this sounds stupid, but that’s three times now I’ve been playing something in iTunes and had the dead speaker splutter into life in time with a beep.
On Dell’s online system configurator, in the midst of questions like ‘do you want Office?’, ‘do you want a 20″ widescreen monitor?’ etc., is this:
This is just a bios option that takes half a second to change, but I don’t really object to Dell charging as it’s not something many people know about. However, I’m pretty sure any computer will boot to the hard drive by default - what does the first £3 actually buy you?
Manic day. A job I estimated would take five minutes stretched into five hours. Simply put: a company can’t connect to a couple of popular IPs, and I’m damned if I can find a reason why. Incredibly frustrating. I’ll head back there tomorrow, but am hoping some ideas pop into my head overnight. Then this evening I nipped over to Nottingham to pick up my forgotten camera, which I’ll be needing it for the upcoming dance weekend. It was nice to see Abi too, of course ![]()
On the way back I listened to the latest in Escape Pod’s audio of the Short Story Hugo nominees. Impossible Dreams and today’s The House Beyond Your Sky were both excellent. I don’t envy the judges one bit.
Ugh, it’s nearly 0200. Bad.
Despite being up until ungodly1 hours last night, today was rather productive.
I had a callout to a petrol station this morning. A power spike had fried the forecourt computer and broken all communication with the pumps. The computer was old. You know how when Arthur Miller died you thought ‘he was still going? Wasn’t he married to Marilyn Monroe back in the 60s?’. Old like that. My hopes of recovery faded when I saw the AT power supply, but the motherboard turned out to have an ATX connector. Replacing an AT PSU with an ATX is always entertaining, mainly because AT PSUs connect(ed) directly to the power switch, and if the case isn’t set up for ATX it’s nigh on impossible to turn the thing on and off short of pulling out the main cable2. There was no other quick solution, however, and after a lot of cable routing and swearing at symmetrical IDE connectors the thing finally powered up. Half an hour’s troubleshooting later and it started communicating with the pumps and we brought in the ‘no petrol’ signs. It wasn’t a difficult job, but was particularly satisfying when customers immediately started pulling in.
I’ve also finally completed my application for the Westminster University Photography degree. Getting references, finding all my GCSE/A-Level certificates and writing a personal statement took longer than expected, but it’s now sitting in an envelope in front of me. I don’t know the official deadline date - nobody seemed to want to tell me - but my application will be sent special delivery in the morning. I’m very concerned it’s too late, but it’s worth a try.
I used to use two monitors. Being able to quickly maximize windows side-by-side is extremely useful, especially when photo/website editing, but my flat hasn’t the space for two large CRTs so I traded them for one widescreen LCD when I moved in. I’m very happy with it as a monitor, but it’s not the same. I recently got hold of a spare 15″ LCD, and hooked it up this morning. The resolutions don’t match and the two are at very different height - moving the mouse from one to another is a weird experience - but it works well nevertheless. I’ve a bunch of Google Gadgets taking up much of the screen currently: Gmails, Google Talk, a scratch pad, a disk/bandwidth monitor and BBC News. The background nature of the Gadgets means they’ll happily sit behind anything I drag over there. It’s helpful being able to shove chat windows out of my main work area.
My uncle had his laptop stolen a couple of months ago, and the insurance just paid out with PC World vouchers. I can see their logic, but it seems unreasonable: the original laptop was chosen for his needs from a wide range of different suppliers, then further customised. We visited PC World this afternoon and found their range small and of poor value compared to the original purchase. I understand the insurers want to avoid him spending the money on other items (although the PCW vouchers could theoretically be used for ten thousand mousemats) but couldn’t they request he get a replacement and supply the receipt?