Using Google Spreadsheets to extract Twitter data

Posted November 20, 2009 in How-to, twitter  |  27 Comments so far

Last weekend I was looking for ways to extract Twitter search data in a structured, easily manageable format. The two APIs I was using (Twitter Search and Backtweets) were giving good results – but as a non-developer I couldn’t do much with the raw data they returned. Instead, I needed to get the data into a format like CSV or XLS.

Some extensive googling led me to this extremely useful post on Labnol, where I learnt about how to use the ImportXML function in Google Spreadsheets. Before too long I’d cracked my problem. In this post I’m going to explain how you can do it too.

Data you can extract from Twitter

This walkthrough will teach you how to extract two types of Twitter data using Google Spreadsheets – tweets and links.

Tweets are extracted using the Twitter Search API in conjunction with ImportFeed. This allows Twitter search results to be extracted into a spreadsheet format.

Links are extracted using the Backtweets API in conjunction with ImportXML. The Backtweets API allows you to find any links posted on Twitter even if they’ve been shortened using services like bit.ly or tinyurl.

I’m in a hurry, can I just do this right now?

If you just want to do it – instead of learn how to do it – just open this Google spreadsheet I’ve created.  You’ll need to make your own local copy so you can edit it. Instructions can be found in the spreadsheet itself.

How to extract tweets containing links

The instructions below will help you create a Google Spreadsheet that pulls in and displays the time, username and text of all tweets containing links to a specified page. Because it uses Backtweets, these tweets will be retrieved even if they used shortened URLs from services like bit.ly or tinyurl.

  1. Create a new spreadsheet in Google Documents.
  2. Enter column labels in this order: “Search criteria”, “Timestamp”, “Username” and “Tweet text” in cells A1 to D1.
  3. In cell B2, underneath Timestamp, insert the following formula:
  4. =ImportXML("http://backtweets.com/search.xml?itemsperpage=100&since_id=1255588696&key=key&q="&A2,"//tweet_created_at")
  5. In cell C2, underneath Username, insert the following formula:
    =ImportXML("http://backtweets.com/search.xml?itemsperpage=100&since_id=1255588696&key=key&q="&A2,"//tweet_from_user")
  6. In cell D2, underneath Tweet Text, insert the following formula:
    =ImportXML("http://backtweets.com/search.xml?itemsperpage=100&since_id=1255588696&key=key&q="&A2,"//tweet_text")
  7. Now paste a search query into cell A2 – say, http://www.google.com. After a few seconds, you should see columns B, C and D fill up with tweets, looking something like the image below:
  8. Google Spreadsheet showing Backtweets results

  9. The formulas pasted into cells B2, C2 and D2 all reference the URL in cell A2. This means that whenever you paste anything new into A2, the search results should refresh.
  10. Also, you can paste parts of URLs into A2 – not just entire ones. This is useful for seeing all links to a specific directory on your site, for example.

Finally, this tool can only extract 100 results at a time – but it is possible to set it up to retrieve more than that. Look at my sample Google Spreadsheet if you want to do this.

Extracting tweets from Twitter search results

The method for doing this is identical to the above, but uses the ImportFeed function instead of ImportXML.

  1. Create a new spreadsheet in Google Documents.
  2. Enter column labels in this order: “Search criteria”, “Timestamp”, “Username” and “Tweet text”. For the rest of this walkthrough, I’m going to assume that these labels are in cells A1 to D1, but in reality you can put them wherever you like
  3. In cell B2, underneath Timestamp, insert the following formula:
    =ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2, "items created")
  4. In cell C2, underneath Username, insert the following formula:
    =ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2, "items author")
  5. In cell D2, underneath Tweet Text, insert the following formula:
    =ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2, "items title")

  6. Type a search query into cell A2 - say, "Hoth." Hit enter and the results will load. It should look something like this:
  7. Google Spreadsheets with data from Twitter searchThings will go wrong if you insert characters like # or @ into the search query. To get around this, type %23 instead of # and %40 instead of @. This will allow you to search for hash tags and usernames.

I haven't been successful in generating more than 20 search results per request, but you can get around this using the page number parameter in the ImportFeed query string. See my own Google spreadsheet to find out how to do this.

I hope these instructions are useful - if you have any comments, questions or feedback, please let me know in the comments.


27 comments so far.  Post a comment

  1. adamnfish
    November 20, 2009 at 2:54 pm [ Permalink

    That’s awesome, thanks Brendan.

    Worth mentioning – you’ll want to sign up for a backtweets developer account if you’re using this a lot (free and easy – http://www.backtype.com/developers). They’ll give you an api key that you can put into the url. In the above examples, wherever it says “&key=key”, replace it with the key you are given (eg &key=3ac2a41b1d3db5cdfa3).

    It’ll work with key=key for now, but you’re likely to hit the rate limit if lots of people are mucking around with that (test) key.

  2. Lakshmi
    March 3, 2010 at 4:45 am [ Permalink

    Had problems with the twitter search, I onlu get a #NA, but the backtweets search works fine. What could be the reason?

    thanks
    Lakshmi

  3. March 10, 2010 at 10:29 pm [ Permalink


    Lakshmi:

    Had problems with the twitter search, I onlu get a #NA, but the backtweets search works fine. What could be the reason?

    thanks
    Lakshmi

    Hmmm, thanks for letting me know about that. I’ve taken a look and it looks like the ImportFeed function in Google Spreadsheets has simply stopped working. A thread on it has appeared on Google Support here: http://www.google.co.uk/support/forum/p/Google+Docs/thread?tid=2e06e4df9b05a817&hl=en

    I guess this is a problem with relying on things like Google Docs; when they break, there’s nothing you can do to fix them! I’ll keep an eye on it however and will post here when the issue is fixed.

  4. patrick
    July 15, 2010 at 10:24 am [ Permalink

    This is much appreciated and I’ll be giving it a try. Related, I noticed that Google has a search function which allows you to use its search engine to search for Tweets. Is there any way to extract #tweets from Google into a Google Doc/ Reason being, I am really interested in getting access to ALL tweets around #g20 which go into their 1000s but am trying to figure out the best way to do this!

  5. Siva Kumar
    August 6, 2010 at 10:19 am [ Permalink

    Thanks a ton! This is very useful. In addition to this I also need the original URL to be retrieved. The original url of the tweet. Please let me know hot to get this done.

  6. December 8, 2010 at 12:56 am [ Permalink

    Thank you so much for creating and sharing that!

  7. megler
    December 25, 2010 at 12:51 pm [ Permalink

    This project is BRILLIANT. One question, how can you export the comments, cut/paste, etc? I can’t seem to get the application to let me take any of the comments out so I can use them, (ie. paste to txt file, etc)

  8. December 30, 2010 at 10:50 am [ Permalink

    @megler, thanks for your feedback! It’s always good to get positive comments.

    I’ve had a look at the copy/paste problem you described and it seems fine to me – if I select individual or multiple fields in Google Spreadsheets, I can copy and paste them into text files. Can you give me more details of the steps you’re taking and I’ll see if there’s anything else I can do to help?

  9. imbenzene
    March 17, 2011 at 1:36 pm [ Permalink

    Hi

    Is any modification plausible that can convert like data into csv or xls very effectively. THough twitoaster provides API but its not feasile for non-programmers like me.
    All I need is conversation data, i.e. activity of a particular tweets of a user to be presented as replies or retweets as xls data.
    Any help would be highly appreciated.

  10. imbenzene
    March 17, 2011 at 1:37 pm [ Permalink

    Hi
    Is any modification plausible that can convert like data into csv or xls very effectively. THough twitoaster provides API but its not feasile for non-programmers like me.
    All I need is conversation data, i.e. activity of a particular tweets of a user to be presented as replies or retweets as xls data.
    Any help would be highly appreciated.

  11. March 23, 2011 at 9:22 pm [ Permalink

    This is very exciting. It’s awfully close to something I’m wanting to do, but not quite.

    I want to track what I read by sending tweets to a particular account whenever I start or stop a book. Is there a way to just follow one account and extract direct messages sent to it?

  12. 22ian22
    March 30, 2011 at 4:54 pm [ Permalink

    Does this still work? I made a copy of the excel and entered a search term then clicked Return. Fields on ever tab say #VALUE so I assume that’s the problem. There don’t seem to be formulas anywhere.

  13. March 30, 2011 at 5:06 pm [ Permalink

    @22ian22 – have you copied it into Excel? If so that may be why you’re seeing those errors. You need to use it inside Google Spreadsheets for it to work.

    The “Twitter search” tabs are working for me in Google Spreadsheets, but there does seem to be an issue with the Backtweets formulas. I’ll take a look into that and update when fixed…

  14. 22ian22
    March 30, 2011 at 7:05 pm [ Permalink

    I realized it may be from using real excel, so I duplicated everything in Google Docs and I’m getting the same thing.

    Also, to guarantee no area I just copied from your google doc to mine and same thing:(

  15. 22ian22
    March 30, 2011 at 7:06 pm [ Permalink

    WEIRD though, after 15 minutes of having a window with a bunch of errors after my last attempt, I’ve done absolutely nothing but all of the sudden the results are there now. What the heck??? lol

  16. 22ian22
    March 30, 2011 at 7:26 pm [ Permalink

    Looks like it started working when I logged out and logged back in lol. Weird, must be some google thing.

    Back on track though, now that I have this working I’ve been able to play around for a few min, and I thought of one other question. I see I can search for lets say ‘brelson’ to find tweets with that included. Do you have a way that you would recommend me use so I can look specifically at tweets that were sent TO brelson? so basically the keyword was referenced or in the tweet.

  17. October 9, 2011 at 4:33 pm [ Permalink

    Brendon-
    Thanks for the very useful and clearly explained solution. Is there any more information about how to extract ALL of the tweets relating to a particular keyword? Also, is it possible to use an advanced search script in the spreadsheet? I haven’t been able to do either yet.

    Thanks again,
    Wanda

  18. edidigit
    January 21, 2012 at 2:35 am [ Permalink

    may I know where I have to type the symbol for # or @ in before or after the name? in the Search Criteria?!

    Do you know the code for $ sign in twitter?

  19. February 12, 2012 at 10:27 am [ Permalink

    Thanks for such a great post. The make a copy function is dimmed. I downloaded a copy, but nothing works. I am very keen to get this working. I published few hours ago a presentation entitled Strategic Applications of Twitter. This work will enrich it.

    http://www.slideshare.net/hudali15/strategic-applications-of-twitter

  20. March 28, 2012 at 1:41 pm [ Permalink

    Dear Brendan I am trying to use your google file to extract geodata, but so far I cannot get results.

    I have been trying the function. 3

    http://search.twitter.com/search.json?rpp=25&geocode=34.0522342,-118.4911912,20mi&callback=?

    But get an error.

    Any ideas on this?

    Thanks!

    R

  21. April 21, 2012 at 5:38 pm [ Permalink

    Can this be used to extract twitter location API data as well?

  22. Kelly
    June 18, 2012 at 7:15 pm [ Permalink

    This is an awesome tool! Thank you! I wanted to save all tweets related to a recent conference I was at. I had to create two Google spreadsheets because there were so many pages of tweets but can’t seem to import more than 22 pages (start of my 3rd spreadsheet is the 23rd page). I can import earlier pages (1 through 22) on that 3rd spreadsheet but can’t get past 23 (and there are more than 23 pages of 20 tweets each). I get a #VALUE! with “error: the feed could not be retrieved. Please check the URL”. Before I spend more time debugging, does anyone know if there is a limit in the number of pages imported?

  23. nacho
    September 14, 2012 at 12:41 pm [ Permalink

    How can i search more than one word/hashtag simultaneously and get results concerning the set of words that i’ve defined?
    Do you know how to obtain geographic information?
    Many thanks in advanced for your post and precious help!

  24. October 19, 2012 at 2:27 pm [ Permalink

    Brilliant resource. This is incredibly useful. Thank you

  25. olla
    October 29, 2014 at 2:24 pm [ Permalink

    Great resouce, sadly it no longer works since both used sources changed their websites. Twitter even requires OAUTH 2.0 now

  26. Shan
    November 30, 2014 at 7:03 am [ Permalink

    Hi,
    Does this work now?
    Coz of twitter new API,
    Any luck how to use new Twitter OAUTH ?

  27. abbey
    December 1, 2014 at 3:15 pm [ Permalink

    Hi,
    I got #ERROR! when i try to use the above sample tutorial. What could possible be the problem.

    Regards
    Abbey

RSS feed for comments on this post. TrackBack URL

Leave a comment