Can Bit.ly solve real-time search?

One of the themes of my last post was that for all this talk of new methods of social distribution the methods of filtering information haven’t substantially changed, instead we’re merely able to better identify smaller data sets to bring to bear on the issue. This got me thinking about the problems of real-time search.

Real-time search has been something of a bugbear. Google recently declared that real-time search is their biggest challenge and a variety of companies have come out of the gate trying to compete in this space. Thus far, all of them do so by reducing the data set to something more manageable. Summize (now Twitter Search), the most famous example, searches Twitter and others like One Riot search Twitter, Digg and other social sites. The issue with these is that while they are manifestly able to beat the likes of Google in terms of providing real-time results, finding relevant information is still difficult. Eric Wiesen points out that ‘“Now” is actually a pretty bad filter for a tremendous proportion of content’ so even when searching for information on stories like the plane in the Hudson, where Summize is generally perceived to have excelled, one still finds oneself wading through an awful lot of hay to find the needle.

However, I think that bit.ly has the potential to approach this space from a slightly different angle and provide real value. Bit.ly is now processing more than 100 million decodes a week. This gives them in essence a real-time global heatmap of what is most relevant to people at this moment in time. Some of the data they surface may not have been published as recently as the last post on twitter, but it is likely to be relevant to the real-time needs of the user doing the search, and while a lot of bit.ly links come from twitter it’s not constrained to any particular service in the way that other real-time search engines are.

The mere fact that someone has chosen to share this information is indicative of a certain level of relevance, but bit.ly’s ability to measure the virality of a given link enables them to rank the relative relevance of those links too. So what they have is a different pathway to achieving something very close to what Google can do with Pagerank over the limited dataset that real-time search requires. It will be interesting to see if bit.ly chooses to go down this path or not, but I think it’s reasonable to suggest that bit.ly has the potential to be a serious challenger in this space and it is certainly much much more than just ten lines of code.

Leave a Comment

All this has happened before…

… and all this will happen again.

John has written a great post bringing together a number of strands of his thinking around social distribution and the real-time web. One of the most interesting points he brings up is that we would be wise to move away from the nomenclature of pages, single, contained items of information that can be consumed wholly, and instead think about streams of data where the idea of total consumption is a mirage.

‘A stream.   A real time, flowing, dynamic stream of  information — that we as users and participants can dip in and out of and whether we participate in them or simply observe we are are a part of this flow…. This world of flow, of streams, contains a very different possibility set to the world of pages.   Among other things it changes how we perceive needs.  Overload isnt a problem anymore since we have no choice but to acknowledge that we cant wade through all this information.   This isnt an inbox we have to empty,  or a page we have to get to the bottom of — its a flow of data that we can dip into at will but we cant attempt to gain an all encompassing view of it.     Dave Winer put it this way in a conversation over lunch about a year ago.    He said “think about Twitter as a rope of information — at the outset you assume you can hold on to the rope.  That you can read all the posts, handle all the replies and use Twitter as a communications tool, similar to IM — then at some point, as the number of people you follow and follow you rises — your hands begin to burn. You realize you cant hold the rope you need to just let go and observe the rope”.  ‘

John makes a strong point here, but I can’t help but think that we are seeing a very similar cycle to that which we’ve seen before with the web. Small subsets of content begin to expand at an exponential rate and then take on the evolutionary path of their parent type. Early attempts at filtering the web itself, such as the early Yahoo, aggregated a variety of content into a single destination. As the volume of content grew, these aggregators developed a more curatorial approach creating portals to the web. These were unable to keep up with the pace and personalization of the web and  were disrupted and diminished by the advent of Google and search.

Now the same pattern is occurring within the subset of the web that is the personal web, that portion of the web made up of the content created and distributed by ourselves and our explicit social connections. We began with lifestreams and newsfeeds, aggregating our data into a single destination site. We’ve seen some of these (such as Friendfeed) evolve to offer more a more curatorial approach, using various algorithms and other methods to filter content more effectively. However, when Dave Winer talks about the burning rope of information, I can’t help but think we are in that median stage between the failure of externally curated content and the rise of a google for the personal web. It’s as if we’re peeking in to watch the indexing of our personal web while we wait for the search box to be built.

Google *is* social distribution

What is sometimes overlooked in the discussion about the rise of social distribution is that Pagerank, the overarching filter of Google, is essentially social recommendation writ large. The greater the number of links to an item (ie. the greater the number of people who ‘liked’ this content), the more prominently it is shown when we engage with the service. It may not be as explicit as a link we see in twitter, but we are still reaching a particular nugget of content because enough people voted this content up by linking to it. So, from this viewpoint, the solution to the problem of the burning rope within the context of the wider web was a search box where I could express the intention of the moment to a social recommendation engine that would return content based upon what the largest network I am involved with ‘likes’.

Having seen how closely the evolution of the subset of the web I am calling the personal web is tracking against it’s larger parent, I can’t help but think that as much as we might relish dipping into the stream and coming across serendiptous  information at this time, the ever growing torrent of content will make this increasingly unwieldy, and at some point we will once again turn to a service that effectively combines the user’s intention of the moment (a directed contemporary filter) with a social recommendation engine (content filtered by the activity of those within the personal web).

So, bottom line, I love and agree with the concept of the stream overtaking the page, but I think that as we go from dipping into the stream to being borne away by the torrent relying on an essentially serendipitous approach is not going to be a feasible long-term strategy. Instead, it will be the service that can index and rank your personal web AND provide some way to signify contemporary intention that will have to evolve.

Comments (1)