Close call with Google Docs

I just came so very close to losing my afternoon’s work. I started a new Google Document at around 1500, and it’s been open ever since. I noticed at around 1800 that the ‘saving’ message was there continually, but I couldn’t say when it had appeared. I left it for an hour, came back and there was no change. ‘Save and Close’ didn’t work, so I closed the tab. I refreshed Google Docs to be informed the document hadn’t changed since 1507. I opened the document and saw a blank screen. Oh, crap.

Thankfully, a little stab of paranoia had saved the day. When I originally went to close the tab the ‘you have unsaved changes’ message popped up. It always does, but for once I paid attention. I select-all/copied the entire document as a last ditch backup, and thankfully this was still in the clipboard a few moments later1. I lost the page breaks, but that’s no big deal.

That’s the first time I can recall Google completely killing a document. That would have been bad. So, if in doubt, reload Google Docs in another tab and check the modified time.

  1. actually I had another backup layer, as I run the security-nightmare but oh-so-very-useful Ditto []

A basic setup to link Google Spreadsheets with Amazon ECS

It took a while to surface, but over the weekend I finally spotted the obvious possibilities raised by Google Spreadsheet’s new xml-linking features. Abi and I currently enter isbn, title and author data into our bookselling spreadsheet by hand, and this gets old quickly. More so for Abi, who has to check the individual Amazon pages to gauge current prices. I knew of Amazon’s API, and was planning to create something to help speed this up, but the only programming language I was ever any good at was Visual Basic, which isn’t around much any more, and I’m seriously rusty at the little .NET I once knew. So I wasn’t quite sure of the best way to approach things, and GS’ new hotness seemed like an ideal solution, if I could get it to work. Happily it did, and I can now enter an ISBN and have GS download the author, title, amazon URL, amazon price and lowest used price data automatically. Here’s a basic way to set it up:

  1. If you haven’t got one, sign up for an Amazon Web Services account and find your Access Key. This’ll let you make 1 request per second, although in practice this is fairly flexible.
  2. Take a look at the Amazon ECS API, and determine the appropriate request. It’s a web service, so supplies data after a standard browser request. Here’s the query I’m using, with an example book:

    http://webservices.amazon.co.ukonca/xml?Service=AWSECommerceService &SubscriptionId=[your_ID] &ResponseGroup=Medium &SearchIndex=Books &Operation=ItemLookup &IdType=ISBN &ItemId=0593055489

    The most important part of this is the ResponseGroup, which determines how much data Amazon supplies. Medium is a good all-rounder, but there are plenty of more specific request types. Specifying ISBN as the IdType allows the use of 13-digit ISBNs, which a standard ItemLookup doesn’t – Amazon’s documentation claims it doesn’t work on the UK site, but it does.

  3. Test out the request in your browser window, and make sure it’s supplying all the necessary data.
  4. Now it’s time to figure out the XPath statements you’ll need to extract the relevant info. If, like me, you haven’t the foggiest idea what XPath means, don’t worry. It’s pretty easy, and the W3Schools tutorial takes 10minutes to explain everything you’ll need.
  5. In this case, I want to extract the Amazon price, the lowest used price, the URL of the detail page, the (first) author and the title:
    • The price I want is in the FormattedPrice element, underneath the ListPrice node. There’s more than one ‘FormattedPrice’ element in the results, so I need to specify its parent node. The appropriate XPath statement is “//ListPrice/FormattedPrice”
    • The lowest used price is the same element, but under LowestUsedPrice, so: “//LowestUsedPrice/FormattedPrice”
    • The URL is in the “DetailPageURL” element, and there’s just the one of these, so: “//DetailPageURL”
    • Similarly with title: “//Title”
    • There may be multiple authors, and grabbing all of them would mess up the spreadsheet formatting we’ll come to in a minute. So I’ll just grab the first one: “//Author[1]”. The number should, I think, be a zero to adhere to the standards. But, it isn’t – zero doesn’t actually work. Go figure.

    These can all be combined into one XPath statement with | operators, so the final result is:

    “//ListPrice/FormattedPrice | //LowestUsedPrice/FormattedPrice | //DetailPageURL | //Author[1] | //Title”

  6. Ok, fire up a spreadsheet. Create a column for ISBN, price, lowest price, url, author, title and finally another for processing.
  7. GS uses the ‘importXML’ function to import data. This takes two arguments – the URL and the XPath statement. Enter an example ISBN, then, in the ‘processing’ column:

    =importXML(“http://webservices.amazon.co.uk/onca/xml?Service=AWSECommerceService&SubscriptionId=[your_ID]&ResponseGroup=Medium&SearchIndex=Books&Operation=ItemLookup&IdType=ISBN&ItemId=”&B2,”//ListPrice/FormattedPrice | //LowestUsedPrice/FormattedPrice | //DetailPageURL | //Author[1] | //Title”))

    The &B2 at the end of the URL appends the ISBN number, so be sure to change this to the appropriate cell.

  8. Google should now go fetch the XML file, process it according to your XPath statement and…dump all the data into the ‘processing’ column. If you click on the cells containing the data you’ll see they have ‘Continue’ formulae. This (currently undocumented) formula simply takes data from the array stored in the first cell. So “=CONTINUE(H2, 5, 1)” shows the data from the 1st child (if it exists) of the 5th item of the results of the XPath processing – the book Title. Copy and past these statements to the appropriate cells on the correct row.
  9. We need to stop Google filling the subsequent rows, so enter a random text string in the second row of the processing column, then enter a different ISBN into the first row to refresh the query. Google will again fetch the data, then ask if you want to overwrite existing data. Say no, and it’ll only populate the Continue formulae we just set up on the correct row.
  10. CTRL-D the formulae down a few rows, and voila, you’ve got an easy-to-use ISBN query spreadsheet that uses one query per row. If you want to tidy it up, wrap it in an if statement like:

    =if(B2=””,””,importXML(“[your_URL]”&B2,”[your_XPath]”))

    isblank() doesn’t seem to work properly with cells that have had content deleted.

Problems with this method:

  • Queries are currently limited to 50. And that’s 50 queries present, rather than active – hiding them behind an if statement based on the contents of another cell doesn’t work. This is easy to get around via copy-and-pasting, but could be annoying in some circumstances.
  • Queries update every two hours. This is unnecessary for this purpose, and I’d personally like to see unlimited one-time only queries that don’t update every 2 hours; hopefully that’ll come.
  • Amazon theoretically have a 1 query-per-second limit. I’ve actually dumped 25 ISBNs at once and had results appear instantly, so I think they must have a flexible policy, but it’s a little risky and I don’t want my account suspended. This is another reason I’d like a one-time-only query option. It’s probably wise to play it safe by cutting-and-pasting the data into another table before closing the sheet.
  • If Amazon doesn’t have a particular datum, the XPath query will be in the wrong order. For example, some annuals and readers digest books have no author listed, so the ‘Continue’ statement that should refer to the 4th field (in this case, the author) instead refers to the title. I can’t think of a way around this without multiple queries.

Hopefully I’ll be able to refine this in the future, but it’s not a bad start, and I’m very impressed that the functionality exists at all.

Googly bits

Google Spreadsheets now has an autofill box, meaning I can drag a cell to automatically extend a series. So if I enter ‘Monday’ and ‘Tuesday’ in consecutive boxes, then highlight and drag the node, it’ll fill in the rest of the days of the week. This is a useful feature in offline applications, and will undoubtedly come in handy online, but there’s a hidden payload: press control while dragging and it’ll look for correlations in Google Sets. I entered ‘monkeys’ and ‘cats’, and it extended the third box with ‘iguanas’. ‘Han’ and ‘Leia’ produced ‘Luke Skywalker’, ‘C3PO’ and ‘R2D2’. Now that’s pretty cool.

GS also added support for external data. Online XML, HTML, RSS feeds, and .csv files can all be referenced, with imported data updated every few hours. That’s extremely powerful, especially when you combine it with the ability to publish – I can now link directly to the data in another user’s Google Spreadsheet. Since the code’s already there, I imagine this’ll be extended to Google Docs pretty soon.

Google Reader now finally has a search box, and also maxes out counts at 1000 rather than displaying ‘100+ items unread’. That’s pretty much all the missing functionality I wanted.

Finally, Gmail and ‘Picasa Web Albums’ users can now buy more storage space, with 6gb costing $20/year. 250gb is $500/year. Clearly, nobody needs 250gb for email and pictures…Google drive, anyone?

Microsoft announce ad-supported version of Works

Microsoft makes most of its money from Windows and Office, and are under increasing pressure from competitors. Not for the OS: Linux is nowhere close to prime-time1, but there are plenty of Office competitors out there. Offline there’s OpenOffice.org, which emulates many of Word and Excel’s major features, while online are the simpler but incredibly easy-to-use Google Docs / Zoho, which are far superior to their offline counterparts when it comes to sharing, portability and backup.

So, Microsoft today announced its move: a free, ad-supported version of Microsoft Works. By the end of the year.

Works. You remember. The crappy word processor and spreadsheet that people use until they find that Word / Excel can’t actually read Works’ default file type. With adverts. Adverts. Who bought Works anyway? It was always bundled with new PCs, and any salesman worth his salary would flog an upgrade to Office.

I think the web 2.0 reaction is: weaaaaak. How about an online version of Word, with all the extra functionality it offers over Google Docs etc., that people pay £5 a month for? Or a cut-down version of Word itself? Or an ad-supported version of Office, for non-commercial use? Anything but Works.

  1. most recent problem: no networking, which turned out to be because I’d gone to the network dialog box via the menu system instead of the system tray icon. The same dialog box. Fixing it required editing text files. I rest my case []

Google Docs and Norton Internet Security don’t get on

I love Google Docs. It’s become my regular word processor / spreadsheet package, and the sharing features are incredibly useful. I used it to write my 50,000 word NaNoWriMo novel last year and it didn’t break a sweat. On the rare occasion I need proper WYSIWYG or more advanced calculations I’ll use OpenOffice, but Google Docs is the future for casual work. I’ve been extolling its virtues to Abi, and for the last couple of weeks she’s been building a spreadsheet. Yesterday it abruptly stopped working, and I couldn’t figure out why.

‘Trying to reach google.co.uk…’ flashed red in the top left after the spreadsheet opened, but it never connected. Her internet connection was working fine otherwise. The culprit turned out to be Norton Internet Security. I could find no exception lists for websites, so the only solution has been to disable it entirely (NAV still runs, and her router has a firewall anyway, so it’s no big loss). I’ve no idea why it suddenly started happening – maybe something from LiveUpdate? Or it detects the every-few-second-saves as some kind of attack, and blocks the IP? Whatever, I thought I’d put it here for Googlers with similar problems.

Warning for NaNo-ers using Google Docs

Google Docs (nee Writely) says I’ve written 1709 words. The NaNo site says 1664. OpenOffice.org says 1670. I ran into this last year when I uploaded my (just) 50,000 word novel to be verified by the NaNo site and found it was 800 words short. Just something to bear in mind. This may well already be on the official forums, but they’re down atm.