Why the Push/Streaming API Architectural Style is More Prominent Than Ever
Back in 2014, unbeknownst to most ProgrammableWeb users (because the change was so deep under the hood), we launched the first phase of what would be a multi-phase migration of the data model behind our various directories; for example, our flagship API directory and our fast growing SDK directory (which, behind the scenes, is directly tied to the API directory). One of the key highlights of this journey was to take a single field that previously and all-at-once chronicled an API’s architectural style (ie: REST or RPC), supported request formats and types (ie: URI-based CRUD queries, SOAP, etc.) and supported response and types (ie: JSON, XML, CSV, etc.) and break it up into dedicated fields for each.
One important reason for this move was to accommodate the trending and emergent architectural styles beyond REST. For example, even though it’s less than ideal to double-up on the name of a specific API technology as an architectural style, GraphQL is an example of an emergent style that doesn’t fit neatly into any of the existing architectural styles like REST or RPC. For the purposes of power searching ProgrammableWeb (a forthcoming feature) and our charts and research, we needed the ability to isolate GraphQL-only APIs or APIs based on any of the other architectural styles (something we can do now, but couldn’t have easily done before).
One older architectural style that wasn’t getting the attention it deserved is what we call the Push/Streaming style of API: the one where the client side doesn’t have to regularly poll a server for the data it needs. Instead, the server essentially flings data at the client when it has something to fling. Other discussions might refer to this architectural pattern as “event-driven” or “publish/subscribe.” Although not entirely synonymous with one another, the different phraseology refers to the same basic idea that data is being pushed or streamed to the client instead of the client repeatedly polling the server to see if there’s new data it should know about.
Twitter’s “Firehose” API is one of the better known examples of this. Although there are limitations to the non-Enterprise version of that API, the client side basically opens a persistent connection to Twitter’s streaming API (along with a filter parameter specifying what tweets to look for) and then Twitter’s streaming endpoint starts spewing collections of matching tweets back to the client in the form of JSON-formatted text. The stream persists until the client closes the connection or until the stream exceeds Twitter’s allowable rate limits for the public API.
Unlike the emergent GraphQL, this idea of event-driven non-polling based data retrieval has been around in various forms for quite a while. But, for a variety of reasons, it was impractical for most organizations to contemplate the idea. It wasn’t until the last year or so that interest in the push/streaming pattern started to gain some mainstream steam. Two catalysts of the rising interest in the push/streaming architectural style are (1) the on-demand and economically feasible availability of nearly infinite compute resources and (2) the increasing commoditization of machine learning and artificial intelligence.
Five years ago, the thought of consuming a data firehose like Twitter’s tweetstream or real-time stock data was inconceivable to most organizations. The sheer amount of computational power needed to process such streams at a reasonable cost was generally unavailable and even for those organizations that could afford it, they’d still need a team of PhDs to develop the science and code that was necessary to mine needles of gold out of the haystack.
Think about it. Through all of our dashboards and analytics, we are swimming in data, the majority of which we ignore because we simply don’t have the time or the patience to extract actionable information out of it. For many, the idea of adding more data to that pool, let alone streaming it in real-time, is sheer insanity.
Today, however, the conditions are much more favorable for organizations to convert such real-time big data into profits. Thanks to improved efficiencies and increased competition (Amazon vs. Google vs. Microsoft vs. Oracle), the public cloud is a nearly bottomless pit of inexpensive teraflops. Simply put, compute capacity is no longer a barrier to dealing with streams of big data in real-time (which in-turn, should have you running towards big data ideas instead of away from them).
And now, neither are the PhDs. Today, there’s no need to recruit your own team of rocket scientists just to make sense of that data. Now, by way of dozens, hundreds, or thousands of public, machine-learning based APIs (depending on when you read this), the ability to mine (in real time) game-changing answers out of all that streamed big data is available at a fraction of the cost that it would have taken to do the same thing just a few years ago.
With all those teraflops and machine learning APIs waiting for data, the last ingredients are the streams of data and of course, the client and server-side technologies needed to deliver and consume them. Unfortunately for you, there’s no single standard way to do it. In other words, there’s a lot to figure out. Fortunately for you however, the nuanced (and in some cases, not-so-nuanced) differences between the different approaches make it possible to optimize a push/streaming architecture to your organization’s particular requirements.
With that in mind, the team here at ProgrammableWeb has assembled this special series to help you to better understand your options when it comes to push/streaming API architectural styles, including some of the turnkey push/streaming API infrastructure providers should you decide to “buy rather than build.” And, like all of ProgrammableWeb’s API University series, we view this series as living content. As new approaches and solutions come to light, we’ll update this series to reflect those developments and our findings as we dig into them.