Traditional search engines crawl the web by systematically following links between billions of pages, then indexing the content on those pages. Broadly, they consider the link to be a signal to an important piece of content.
OneRiot, in contrast, considers realtime activity on the social web when determining which pages to index. We consider the links people are tweeting, or digging, or sharing on other services, as a signal to an important piece of content.
In the last two years there has been an explosion in the number of links being shared across the realtime social web. This in part has been driven by the phenomenal growth of Twitter. But the realtime web is much wider than Twitter.
Services like Digg and Delicious – whose user communities provide a wealth of explicit “social signals” to important pieces of content – also continue to grow. Meanwhile, the rise of sharing services like Shareaholic have promoted additional realtime sharing of content on the web. URL shorteners like Bit.ly and TinyURL make this even easier for users. Facebook and other soc-nets make it easy to share links across users’ social graphs. Some of this information is publically available, some is not. But there are a plethora of tools and services that have made sharing of links commonplace among the 230 million US users of the internet, and millions more internationally.
At OneRiot we aggregate that realtime activity across the social web, considering the links people are sharing right now. We then crawl to the pages those links point to, and index the content on those pages – and we do it fast. Currently we index the content of the page and make it ready to search in less than 0.8 seconds.
It’s a completely new way to index the web. Effectively, users of the social web are curating the search index as they Tweet, Digg or share links on other services. Those pages inherently have social buzz and implicitly reflect “what’s going on right now” for their subject matter. Meanwhile, we provide the infrastructure to keep it all up to date in “realtime.”
In addition, OneRiot also draws upon its own panel of users to help determine what webpages will be indexed. Similar to Compete.com or other internet measurement services, OneRiot manages a significant panel of users (almost 3 million strong at this point) who have opted in to pass back anonymous data about what pages are important to them as they surf the web.
This aggregation of data from our own panel alongside realtime sharing activity on services like Twitter and Digg helps create a huge realtime index of the web. While the volume of shared links on Twitter is exploding, they account for a fraction of the web pages in our index. This is important. There is no doubt that Twitter provides a tremendously valuable stream of data for us, but Harvard Business Review recently reported that 10% of the Twitter users create 90% of the content. If a search index is exclusively based on tweets its results will be heavily biased towards the social activity of that subset of power users. So OneRiot’s search index is constantly being updated with the web pages that are generating social buzz across the whole web right now, not just on one service. We index hyper-fresh, socially relevant pages. Pages that perhaps haven’t been published long enough to start building up a traditional Rank in Google. In other words, our index is full of potential results for that 40% of users performing Browse queries. When the user wants to know “what’s going on right now,” we’ve got the pages indexed to help answer that question – powered by the social web, and a lot of realtime infrastructure.

Thanks for a great insight into how OneRiot works. More generally, the article really highlights the importance of embracing the realtime / social web.
regards, John