1. Using Google Spreadsheets to extract Twitter data

    Posted November 20, 2009 in How-to, twitter  |  28 Comments so far

    Update (5th December 2017): Several years ago, Twitter changed its API in a way that completely broke the process I describe below. I don’t know how you’d do the same thing today. It would probably help if you were some kind of white supremacist, going by where Twitter’s moral compass seems to be pointing.


    Last weekend I was looking for ways to extract Twitter search data in a structured, easily manageable format. The two APIs I was using (Twitter Search and Backtweets) were giving good results – but as a non-developer I couldn’t do much with the raw data they returned. Instead, I needed to get the data into a format like CSV or XLS.

    Some extensive googling led me to this extremely useful post on Labnol, where I learnt about how to use the ImportXML function in Google Spreadsheets. Before too long I’d cracked my problem. In this post I’m going to explain how you can do it too.

    Data you can extract from Twitter

    This walkthrough will teach you how to extract two types of Twitter data using Google Spreadsheets – tweets and links.

    Tweets are extracted using the Twitter Search API in conjunction with ImportFeed. This allows Twitter search results to be extracted into a spreadsheet format.

    Links are extracted using the Backtweets API in conjunction with ImportXML. The Backtweets API allows you to find any links posted on Twitter even if they’ve been shortened using services like bit.ly or tinyurl.

    I’m in a hurry, can I just do this right now?

    If you just want to do it – instead of learn how to do it – just open this Google spreadsheet I’ve created.  You’ll need to make your own local copy so you can edit it. Instructions can be found in the spreadsheet itself.

    How to extract tweets containing links

    The instructions below will help you create a Google Spreadsheet that pulls in and displays the time, username and text of all tweets containing links to a specified page. Because it uses Backtweets, these tweets will be retrieved even if they used shortened URLs from services like bit.ly or tinyurl.

    1. Create a new spreadsheet in Google Documents.
    2. Enter column labels in this order: “Search criteria”, “Timestamp”, “Username” and “Tweet text” in cells A1 to D1.
    3. In cell B2, underneath Timestamp, insert the following formula:
    4. =ImportXML("http://backtweets.com/search.xml?itemsperpage=100&since_id=1255588696&key=key&q="&A2,"//tweet_created_at")
    5. In cell C2, underneath Username, insert the following formula:
      =ImportXML("http://backtweets.com/search.xml?itemsperpage=100&since_id=1255588696&key=key&q="&A2,"//tweet_from_user")
    6. In cell D2, underneath Tweet Text, insert the following formula:
      =ImportXML("http://backtweets.com/search.xml?itemsperpage=100&since_id=1255588696&key=key&q="&A2,"//tweet_text")
    7. Now paste a search query into cell A2 – say, http://www.google.com. After a few seconds, you should see columns B, C and D fill up with tweets, looking something like the image below:
    8. Google Spreadsheet showing Backtweets results

    9. The formulas pasted into cells B2, C2 and D2 all reference the URL in cell A2. This means that whenever you paste anything new into A2, the search results should refresh.
    10. Also, you can paste parts of URLs into A2 – not just entire ones. This is useful for seeing all links to a specific directory on your site, for example.

    Finally, this tool can only extract 100 results at a time – but it is possible to set it up to retrieve more than that. Look at my sample Google Spreadsheet if you want to do this.

    Extracting tweets from Twitter search results

    The method for doing this is identical to the above, but uses the ImportFeed function instead of ImportXML.

    1. Create a new spreadsheet in Google Documents.
    2. Enter column labels in this order: “Search criteria”, “Timestamp”, “Username” and “Tweet text”. For the rest of this walkthrough, I’m going to assume that these labels are in cells A1 to D1, but in reality you can put them wherever you like
    3. In cell B2, underneath Timestamp, insert the following formula:
      =ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2, "items created")
    4. In cell C2, underneath Username, insert the following formula:
      =ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2, "items author")
    5. In cell D2, underneath Tweet Text, insert the following formula:
      =ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2, "items title")

    6. Type a search query into cell A2 - say, "Hoth." Hit enter and the results will load. It should look something like this:
    7. Google Spreadsheets with data from Twitter searchThings will go wrong if you insert characters like # or @ into the search query. To get around this, type %23 instead of # and %40 instead of @. This will allow you to search for hash tags and usernames.

    I haven't been successful in generating more than 20 search results per request, but you can get around this using the page number parameter in the ImportFeed query string. See my own Google spreadsheet to find out how to do this.

    I hope these instructions are useful - if you have any comments, questions or feedback, please let me know in the comments.


  2. Readability of online text

    Posted November 10, 2009 in user centred design  |  No Comments so far

    I’ve been trying to codify some guidelines for writing for the web recently, and came across this study (PDF) by Wichita University’s Software Usability Research Laboratory in 2005.

    The study involved 66 graduate students with either normal or corrected vision being given a short story to read online. A preliminary reading test was carried out on participants so the study could predetermine their reading speed. Different text layouts were used, such as multiple column, full justification and so on. Study participants were tested for both reading speed and reading comprehension.

    • Reading speed: Multiple-column layouts impaired reading speed when text was left-justified. However, left-justified text was read more quickly in a single column layout than full-justified text. The highest reading speed was 269.33 words per minute for two-column, full-justified text.
    • Reading comprehension: No significant variation was found across the different text formats.
    • Fast versus slow readers: Faster readers benefited most from the 2-column, fully-justified layout. Slow readers benefited from 1-column, left-justified text.

    The study was perhaps limited by the fact that the participants, as undergraduates, were heavier readers of online text than the average member of the population. I’d be interested to see if any similar studies have been carried out with a larger sample size, broader age range and a more representative mix of internet ‘natives’ versus internet ‘newbies’. Does anyone know of any? If I find some I’ll post them here.


  3. links for 2009-11-06

    Posted November 6, 2009 in links  |  No Comments so far