Three Reasons Why Twitter’s New Streaming API Rocks

republished from blog.oneriot.com.

The links that people share on Twitter are important signals for OneRiot’s realtime search engine. Broadly speaking, the more people share a link to a specific piece of content, and the faster the rate of sharing right now, the higher that content will appear in our search results. (You can read more detail about our ranking algorithm in this white paper.)

After almost a year of working with the team at Twitter and integrating their Search (aka REST) API, we recently started using the Twitter Streaming API and wanted to share with our developer friends and the greater tech community why we’re pumped about it:

1 – Data volume is fantastic

With Twitter’s Streaming API we are seeing almost 2X the data as we were prior. Stream design paradigm is smart – Twitter is now pushing data in realtime as opposed to 3rd party developers asking for it. Twitter’s old REST API could only be maximized by using multiple threads which would cause duplicate tweets and missing data. Our team had to be very diligent to de-dupe tweets and back-track to reduce the number of missed tweets. Not to mention the complexity of multithreaded programming logic. The new streaming API follows a good design pattern allowing the data to flow in realtime without requiring a second thread. This means less complex programming logic, no more duplicate tweets, and a fully maximized data volume set – a huge improvement!

2 – No more pesky HTTP 503 errors from the search API

The new streaming API allows our data feeds less interruption from HTTP 503 errors (“Service Unavailable”). The old API required us to build a special catch-up thread to make sure we didn’t miss any data during outages. This was a timely and expensive problem. Since implementing the Streaming API we haven’t experienced any service availability issues and have eliminated our “catch-up” process.

3 – It’s easy to integrate

The Streaming API is simply easier to write code for. It took us less than two days to fully integrate the new API with a very small learning curve and a barebones system. I should also point out that the Twitter Streaming API is extremely well documented. (To be fair, so was the last one, but it should be noted that they did a great job with this documentation too!)

No matter what programming language you use (Ruby, Pearl or Java) the integration is seamless. Here’s how we integrate the Twitter Streaming API at OneRiot:

Java is our programming language of choice because it’s fast to develop while delivering high performance. We also use HTTPClient library to connect to the Twitter stream. The tweets are returned in JSON which we parse through right as it comes in the stream. (Side note about JSON: we are pleased that Twitter supports JSON since it’s a lightweight protocol that’s quick to download and easy to read but it’s not as bulky as XML. Oh, and it also has less overhead with clearly structured data.) Lastly, we publish tweets using a traditional publisher subscriber model. Since Twitter doesn’t require a server, we have found that the traditional publisher subscriber model is less daunting than Pubsubhubhub which is more complex and has server requirements.

As you can tell we are big fans of the Twitter Streaming API and would highly recommend any 3rd party developers who have not already converted to do so.

2 comments to Three Reasons Why Twitter’s New Streaming API Rocks

  • I’m trying to make use of the new Streaming API (via the Twitter4j API). I notice on your website, you have managed to allow phrases to be entered, whilst making use of the Streaming API. How did you manage this? Using the filter method of the TwitterStream class (Twitter4j), the track argument takes a String array of keywords. Phrases are not permitted.

    Appreciate your insight & advice.

  • Mark,
    Yes, the twitter streaming API only supports keywords searches.

    Searching for phrases is possible on oneriot.com because we maintain our own search index that includes twitter data. This is possible because we continuously search the stream and add that data to our search index in real-time.

    Regards,
    Nathaniel