1. A hedge fund based on Twitter may not be as stupid as it sounds

    Posted May 24, 2011 in comment  |  No Comments so far

    Using online analytics and social media trends to predict real-world events is nothing new. Twitter’s been used to predict box-office sales (story link, detailed paper) and Google search data has been telling us about future flu epidemics for a while now.

    Even I got in the act, demonstrating back in 2009 that Google Insights could anticipate changes in UK unemployment figures.

    Financial difficulties searches versus unemployment, until April 2009

    UK unemployment rate charted against search volumes for 24 related keywords, from January 2004 to April 2009 Sources: Office for National Statistics, Google Insights

    Maybe I should have followed through with that idea, because there’s now a hedge fund that bases its investment decisions on data from Twitter. It’s called Derwent Capital Markets, it opened for business last week, and if its managers end up making a mint there might well be a new bandwagon in town.

    So how do you run a hedge fund based on tweets? From what I understand of Derwent’s methodology, their algorithms measure the “calmness” of the Twittersphere – presumably based on sentiment analysis, which I’m a bit skeptical about. This is used to estimate the volatility of the Dow Jones Industrial Average index, with a three-day time lag.

    This leaves a lot of unanswered questions. Does a non-calm day of Twitter conversations always correspond to a drop in the DJIA, or just volatility? Are they trying to predict metrics like trade volume and so on as well as broader day-to-day movements in the overall index? And are they ranking Twitter users based on credibility, or are spam bots equal to financial journalists, economists, and prominent investors?

    Obviously algorithmic hedge funds aren’t about to disclose their inner workings so questions like this will have to remain unanswered for now. But what of the other, larger, question – isn’t the whole idea just, well, a bit… silly?

    I can see why people might react in this way, and even I feel a bit skeptical about something describing itself as a “social media-based hedge fund” and that apparently pulls data only from Twitter, when there are lots of other sources that could be tapped. But it would be wrong to dismiss the basic concept.

    Our everyday activities – web searches, page views, purchases, things we say on open social networks – leave a trail of data behind, which we tend to see as ephemeral or throwaway. We severely underestimate the value of this data but Google doesn’t, Facebook doesn’t, and we shouldn’t either. This data becomes even more valuable when aggregated across entire countries, continents, or the planet as a whole. In fact, it could be argued that the predictive potential of aggregated global real-time data has yet to be fully imagined, let alone realised.

    The biggest problem with this resource is that we don’t really know how to exploit it yet. Things like Google Flu Trends or this Twitter-based hedge fund may be crude and experimental, and will definitely look even more so in five years time. Along the way there will be hype, bandwagonism, maybe even a stock market bubble, resulting from the application of real-time data to real-world problems.

    But we need to make a start somewhere, and as silly as a Twitter-based hedge fund might sound, it’s as good a place to begin as any.

  2. Jan Pen’s striking method for picturing US income inequality

    Posted January 23, 2011 in ephemera  |  No Comments so far

    The Economist’s article on the rise of the cognitive elite describes Jan Pen’s compelling way of explaining income inequality in the United States:

    Imagine people’s height being proportional to their income, so that someone with an average income is of average height. Now imagine that the entire adult population of America is walking past you in a single hour, in ascending order of income.

    The first passers-by, the owners of loss-making businesses, are invisible: their heads are below ground. Then come the jobless and the working poor, who are midgets. After half an hour the strollers are still only waist-high, since America’s median income is only half the mean. It takes nearly 45 minutes before normal-sized people appear. But then, in the final minutes, giants thunder by. With six minutes to go they are 12 feet tall. When the 400 highest earners walk by, right at the end, each is more than two miles tall.

    via A special report on global leaders: The rise and rise of the cognitive elite | The Economist.

  3. UK unemployment drops… unexpectedly?

    Posted January 21, 2010 in research, strategy  |  No Comments so far

    The UK Office for National Statistics announced yesterday that unemployment had dropped for the first time in 18 months. BBC News reported this as a “surprise”:

    The number of people unemployed in the UK has fallen unexpectedly for the first time in 18 months… George Buckley [of Deutsche Bank] admitted previous predictions of the unemployment rate reaching 10% now looked unrealistic… [The figures] came as a surprise to many analysts.

    But to regular readers of this blog, this news is anything but unexpected: in December 2009, my analysis of unemployment-related search trends clearly indicated that the unemployment rate was about to fall.

    So could this be an example of search trends providing early insight into economic data? Possibly, but it’s only one month’s figures we’re talking about. A sustained track record of successful projection is needed to demonstrate that search analysis can yield valuable insights.

    Over 2010 I’ll be keeping an eye on the data to see what happens. In the meantime, if you can think of other real-world metrics that might be suitable subjects for search trends analysis, get in touch.

  4. Can search predict the future?

    Posted December 23, 2009 in research, strategy  |  2 Comments so far

    Today, we often  search for information about upcoming major events in our lives – both good and bad – before we experience them. When facing financial difficulty or unemployment, many of us will go online at the earliest opportunity to look for help and guidance. And when we’re considering major financial decisions such as buying a house, search engines are usually consulted before estate agents are called.

    Traditional economic reports, on the other hand, look at events that have taken place. Unemployment figures tell us how many people are claiming benefits rather than how many people have been put at risk of redundancy. Average house prices are based on completed transactions, not how many people are currently looking to buy. So while we can be fairly confident of these reports, they don’t provide us with particularly current insights.

    This trade-off between confidence and currency was, in the past, largely academic as analysing current data was almost impossible. But in the age of the real-time web, this might be about to change: maybe patterns in search behaviour can give us a glimpse of future patterns in the economy.

    We first became interested in this topic back in spring 2009, so we analysed search patterns for two sets of keywords as the UK economy went into recession. We looked for relationships between these search patterns and related economic indicators, and listed some tentative predictions based on what we observed.

    House prices

    In April 2009, we looked at volumes for 23 keywords that homebuyers might use, including buying a home, cheap mortgage and mortgage providers. UK search volumes for these keywords were then compared to house prices.

    House prices charted against search volumes for 23 related keywords, from January 2004 to April 2009. Sources: Nationwide, Google Insights

    Searches typically decline as autumn ends before rebounding in January. But in 2008, the January rebound was lacklustre and the decline came in spring – much earlier than usual. This was in line with house prices, which peaked in late 2007 and dropped severely from spring 2008.

    In the first few months of 2009, however, search volumes enjoyed a far stronger January rebound than in the previous year – so we hypothesised that house prices would bottom out or even start to rise again in the middle of 2009. Let’s look at how accurate that hypothesis turned out to be.

    House price data to the present

    House prices charted against search volumes for 23 related keywords, from January 2004 to December 2009 Sources: Nationwide, Google Insights

    Sure enough, the search volume resurgence was accompanied by house price growth throughout 2009. But you’ll notice that search volumes soon tapered off, with a particularly steep fall after August. Our revised hypothesis, then, is that house prices will initially plateau and then drop again. We’ll revisit the statistics in spring 2010 to see how things turn out.

    Financial difficulties

    The second set of keywords we analysed was related to impending financial difficulties such as joblessness, debt and insolvency. They included signing on, mortgage arrears and debt problems, and were compared to the UK jobless rate.

    Financial problems searches versus unemployment

    UK unemployment rate charted against search volumes for 24 related keywords, from January 2004 to April 2009 Sources: Office for National Statistics, Google Insights

    These search volumes dip at the end of each year before rising in January – and the rise in early 2008 was more pronounced than in previous years. The jobless rate started climbing three months later, suggesting that in this case search patterns might anticipate economic statistics. We observed that search volumes had dropped significantly in the first few months of 2009, so our hypothesis was that the jobless rate would stabilise but not drop between April and July. The chart below shows what actually happened.

    Financial difficulties searches versus unemployment, until now

    UK unemployment rate charted against search volumes for 24 related keywords, from January 2004 to April 2009 Sources: Office for National Statistics, Google Insights

    The unemployment rate has indeed stabilised, wavering between 7.7% and 7.8% since early June, suggesting that our original hypothesis was valid. And search volumes have kept on dropping throughout 2009. If search trends do anticipate economic reports in this case, we should see the unemployment rate drop steadily between now and spring 2010. Again, we’ll revisit these figures in April to see if this happens.


    Our hypotheses from April 2009 were largely borne out as the year progressed: the drop in house prices was reversed and unemployment rates stabilised. So maybe there is some truth to the notion that search patterns can shed some light on forthcoming economic change.

    But these hypotheses were in tune with the economic mood of the time. Many commentators were talking about green shoots and a V-shaped recession – there was a feeling that recovery was just around the corner. Today, we remain in what has become the longest-running recession in recorded history and there is considerable uncertainty about what 2010 will bring.

    Our new hypotheses are less likely to be tainted by current economic consensus, precisely because no real consensus seems to exist right now. For this reason, the idea of search predicting the future will be seriously tested as the year unfolds. Don’t forget to come back in April 2010 to see the results for yourself.

  5. Infographics at work

    Posted November 26, 2008 in media, visualisation  |  No Comments so far

    Last night I watched IOUSA on the BBC iPlayer (unfortunately this was over cable TV – I can’t find it on the web iPlayer). It’s a film made by the former US Comptroller General, David Walker, which attempts to convince the viewer of the seriousness of America’s national debt problem.

    …and it worked on me. The most effective aspect of the film was its use of infographics to convey a sense of historical scale. At its core was a recurring animated graphic showing the national debt from America’s inception through to the end of the George W Bush era in 2008.


    Early on in the film you see the rises in the national debt from $0 in 1835 (the only point in history when it hit zero) up until the start of World War One. After that the graphic has to keep zooming out to fit in the subsequent growth. The Great Depression sees a quite unnerving hike – but as the World War Two period looms into view, it looks like a sheer cliff face. This is a shot of the graphic running up until 1988:

    US national debt through to 1988

    In the Clinton era the debt comes down, but then Bush takes charge in 2000 and things go through the roof, rocketing past WW2’s peak. The final sequence involving this graphic displays a projection for debt growth through to 2040. Baby boomers are set to retire en masse shortly and the effect on Social Security and Medicare spending will not be good. The effect this has on the infographic – the drastic zoom needed to chart the debt up to 2040 – almost gave me a sense of vertigo. It paints a pretty dystopian vision of the future.

    Pie chart

    Even though the film is unlikely to contain any new information for someone with more than an advanced lay knowledge of the current economic situation, I’d strongly recommend watching it. As well as the extremely well designed and animated graphics, it does a remarkably effective job of communicating the seriousness of the situation even to viewers who are already aware of most of the facts.