What is a WebSockets Push-Styled API and how does it work?

This is the third part of our series on push technologies. In part two, we looked at PubSubHubbub, a closely related cousin to Webhooks. In this part, we look at how Webhooks are appropriate when you need a server-to-client real-time interactive exchange, what you can do with WebSockets, and how Webhooks differ from other push technologies.

Push technologies must cover a range of data and user needs. Parts one and two of this series discuss techniques for providing what amounts to real time server-to-server data exchange, with clients receiving pushed data outputs that can happen after the fact. However, in some cases, you actually need a server-to-client real time interactive exchange, which is where WebSockets come into play. The third part of this series discusses what you can do with WebSockets and how it differs from other push technologies.

What is WebSockets?

WebSockets makes it possible for a client to make a data request to a server, and then receive event-driven responses from that server in real time. Unlike many Web technologies, WebSockets doesn't use a request/response strategy where a connection is opened in the course of making the request, and then closed after it's initially fulfilled. In the case of WebSockets, the connection remains open. The fact that the client doesn't have to continuously poll the server for updates means that the application runs significantly faster and uses resources more efficiently. A WebSockets setup consists of the following elements:

Client: Makes a request for specific data from the server.
Websocket Gateway: Provides a websocket interface between client and server.
Server: Sends updates to the client through the websocket gateway in real time.

The intent of WebSockets is to provide an alternative to HTTP for communication over TCP. The initial client request can take place over HTTP using an upgrade request header, after which, communication takes place using the WebSockets WS protocol. In other words, instead of sending a request to http://<something>, you send it to ws://<something>. Using this technique accomplishes the following goals:

Reduce HTTP header overhead by transferring only essential information, which produces a significant reduction in resource usage and makes real time communication possible.
Create a full duplex communication environment to get rid of the need for polling and the request/response architecture. Both client and server can push data in the direction needed whenever needed. The reduction of network traffic also serves to increase application speed by removing the latency normally encountered in Web communication.
Use a single TCP connection to reduce resource usage.
Maintain an open connection over TCP to make data streaming possible.
Overcome limitations with existing technologies such as:
- Polling: The periodic, scheduled, request cycle on the part of the client to obtain information from the server, even when such information doesn't exist, wastes a huge number of resources.
- HTTP Streaming: Even though the connection remains open all of the time, the use of standard HTTP headers increases the file size and reduces efficiency.
- Asynchronous JavaScript and XML (AJAX): Relies on using the JavaScript XmlHttpRequest object to replace just a part of the page as needed for each update. The use of HTTP headers increases the file size, reliance on half duplex communication means using more TCP channels, and the need of the Web server to push content to individual clients increases Web server resource usage.

To use WebSockets, you must have a compatible browser, which currently includes: Chrome, Edge, Firefox, Internet Explorer, Opera, and Safari. The need for a compatible browser also limits the usefulness of WebSockets (however, the limitation is insignificant unless you have a large client base that uses older browser technology). This article is based on a conventional WebSockets deployment involving a browser on the client side. However, server-to-server WebSocket integrations are a possibility; WebSocket clients are available for most major server platforms including: Node.js, PHP, Python, Ruby, .NET, and Java.

How Does WebSockets Work?

WebSockets are relatively straightforward. However, you can encounter a few oddities. For example, some setups rely on a WebSocket gateway to make it easier to continue to use the server without modification and some have the client interact directly with the server. The following diagram shows the most common setup.

Starting with a Handshake

In all cases, the client follows the same order of interacting with either the websocket gateway or the server. The session begins with the handshake, which relies on the HTTP GET method. What you want to do is upgrade the HTTP connection to a WS connection using a number of request headers like those shown here:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

The request must use the HTTP GET method and send the method call to the server's GET handler, which is at /chat in this case. This request must use HTTP 1.1. The various headers perform specific tasks:

Host: Defines the host location.
Upgrade: Specifies that the server should upgrade the request to a WebSocket, which uses the WS protocol.
Connection: Defines the connection type.
Sec-WebSocket-Key: Provides the server with the WebSocket key, which proves to the server that it has received a valid WebSocket request. This key is only used during the opening handshake and isn't the same as the key used to mask data (as explained later in the article). The key comes from an agreed upon source and may be issued by the API vendor. According to RFC6455 the client must generate a new random key for each request.
Sec-WebSocket-Version: Determines the websocket version.

If the server can't handle the request, it responds with a 400 Bad Request error. Otherwise, it sends an HTTP response with the following headers:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The two pieces of essential information in this case are the message, which is 101 Switching Protocols, and the Sec-WebSocket-Accept header. The header contains a key that verifies to the client that it has contacted the correct server. When the client doesn't receive the correct key, then the server is suspect and communication should end. Interestingly enough, the server derives this key from the base64-encoded SHA-1 of the concatenation of the Sec-WebSocket-Key that the client originally sent, so the Sec-WebSocket-Accept key will differ each time because the client's request key differs each time.

It's important to realize that the headings described in this section are the minimal headings used by client and server. All the normal HTTP headings apply. For example, the server can send the client a cookie using the standard techniques.

Establishing and Using the WS Connection

The use of the WS protocol means doing things in the WebSocket way, with events. Data moves between client and server using a series of messages that include data frames (described later in the article). No matter which language you use, you need to either write custom code or use a library that can receive and interpret the four WS events:

Open: Occurs when the WebSockets connection is established between client and server using the initial handshake.
Message: Happens during the entire open phase of the connection. Client and server can both push messages using the same bidirectional TCP connection. Messages can take several forms as described by an Opcode.
Error: Signals that a communication problem has happened. The message always includes an error code, which may actually signal a normal event. For example, an error code of 1000 specifies that the connection closed normally.
Close: Occurs when the websocket connection closes.

Viewing the Low-level Details

In working with WebSockets, you might find it helpful to see the details of the communication between client and server. Of course, you can use products like Wireshark to perform this task, but using Wireshark can be difficult and it isn't as if you can directly read the communication in human form. Fortunately, you have access to tools specifically designed to make working with WebSockets easy as shown in the following list:

Chrome 20 and Above
Firefox 25.0 and Above
Internet Explorer 11.0+, Chrome 32.0+, and Firefox 25.0+

Dealing with Data Frames

WebSockets rely on data frames to send and receive information. The main reason to use data frames is to enhance security--to make it much harder for an intruder to corrupt the message.

The IETF's RFC 6455 specification for the WebSocket Protocol spells out a number of ways in which to use data frames, but essentially these frames are composed of bit-formatted data that relies on the following fields.

Final (1 bit): Set to 1 to show that this is the final frame of a data exchange.
Reserved 1 through Reserved 3 (1 bit each): Generally set to 0 unless the data exchange specifies custom values for these bits.
Opcode (4 bits): Determines the kind of payload that the data frame carries, which can be any of the following values:
- x0: Continuation frame
- x1: Text frame start
- x2: Binary frame start
- x3-x7: Reserved for further non-control frames
- x8: Close connection
- x9: Ping (the start of a heartbeat message)
- xA: Pong (the response to a heartbeat message)
- xB-xF: Reserved for further control frames
Mask (1 bit): Set to 1 when the payload data is masked, which means including the key in the Masking Key field. The client must mask data it sends to the server for security reasons.
Payload Length (varies as 7 bits, 7+16 bits, or 7+64 bits): Defines the payload data length. Payload data 125 bits or less uses just the 7 bits of this field. When the initial 7-bit value equals 126, then the Payload Length field is 7+16 bits long. When the initial 7-bit value equals 127, then the Payload Length field is 7+64 bits long. There are other rules when using this field, but essentially you need to know that the size varies according to the amount of data you wish to send.
Masking Key: Contains the key used to unmask the data when the Mask field bit is set to 1. There is a standard algorithm used to mask and unmask the data as described in Section 5.3 of the specification.
Payload Data: The payload data is actually divided into two parts.
- Extension Data: Describes how the data is sent and how to interpret it. For example, you might compress a file before sending it. The extension data tells how to interpret the compressed file to return it to its original state.
- Application Data: The actual data sent between two parties consumes the rest of the data frame.

Thankfully, libraries help you deal with the data frames. Generally speaking, when using an appropriate library, you seldom have to deal with the data frames except when troubleshooting errors, in which case, you really do need one of the tools discussed in the previous section.

Combining WebSockets with PubSubHubbub

It makes perfect sense to combine WebSockets with PubSubHubbub. However, you also need to consider that the process is a little different in this case because you're looking for a realtime connection with a Web browser client that will close the connection when the session ends. This isn't a long term publish/subscribe setup as you might have expected from Part 2 of this series. In this case, the process looks something like the series of events shown in the following diagram.

The actual process for creating the PubSubHubbub setup is the same as in Part 2, but now you have the added steps required by WebSockets. As you can see, the two technologies mesh nicely as long as you remember that the subscription is short term--only the length of time that the client is actually active. You can use the same feed you did with the previous part of this series.

Screenshot of site about PubSubHubbub and WebSockets

Calling a WebSockets API

As with everything else in computing, trying to put all of this information into context often comes down to playing some some examples. One of the more interesting online examples is the websocket.org echo test. An immediate benefit to this example is that you can determine whether your browser actually does support WebSockets.

Notice that this example uses a default of the WSS protocol, which is the secure version of the WS protocol. The site allows you to try using TLS in place of SSL. The Log field shows the message traffic that occurs as you send and receive messages from the server using this browser client. In fact, you can get a low level view using any of the tools mentioned in the Viewing the Low-level Details section.

While this example is interesting, it doesn't really show the power of WebSockets

Even though the interface looks much like any other Internet map application, you might find the smooth operation and immediate response amazing, especially if you don't enjoy the luxury of a truly high speed connection. The speed is the important part because it helps demonstrate the main benefit of using WebSockets in an easily understood and immediately verifiable form.

This is part three of our series on push technologies. In part four we will examine alternative push technologies and protocols.