How GraphQL delivers on the original promise of the semantic web. Or not.

Table of Contents

1. Simplifying the Semantic Web Using GraphQL

2. GraphQL and the Semantic Web: Good But Not Complete

3. Next Steps: Moving From Theory to Practice

This is Part 4 of the ProgrammableWeb API University Guide to GraphQL: Understanding, Building and Using GraphQL APIs.

In previous installments in this series we looked at the history that led up to the advent of GraphQL. Also, we looked at the GraphQL specification and a real world GraphQL API that we created using Apollo Server 2.0, a Node.js-based implementation of the GraphQL specification. Now we're going to take a step back and look at GraphQL in terms of the Semantic Web. The Semantic Web is an extension of the World Wide Web that's intended to enable machines to search and understand the information on the Internet in a meaningful way. The Semantic Web has been quietly influencing the evolution of the World Wide Web since the early days of web page publishing.

In this installment we're going to look at how the Semantic Web came to be. We'll look at the historical evolution of the technologies that emerged to meet its requirements. Then after we understand the historical underpinnings, we'll look at how some of the promise of the Semantic Web is realized by GraphQL, now and going forward. To be clear though, living up to the promise of the Semantic Web was never a stated objective of GraphQL's inventors. It's just that the similarities are too coincidental to ignore. If you're familiar with the Semantic Web, you'll be pleasantly surprised by some of GraphQL's advancements. If you're not familiar with the Semantic Web, it's worth it to learn about what it attempted to achieve and how GraphQL delivers on some of that potential.

Understanding the Semantic Web

In order to understand the Semantic Web, you need to understand the problem it attempts to solve — how to create a way to describe entities and relationships that are exposed on the Internet in a standard format that is machine-understandable.

The Semantic Web is an idea that has been around since the introduction of the Internet to mainstream computing. An article published by Tim Berners-Lee, James Hendler, and Ora Lassila in a 2001 issue of Scientific American brought the Semantic Web into the mainstream. According to Berners-Lee, et. al.,

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better-enabling computers and people to work in cooperation."

The Scientific American article built on ideas previously described in the Resource Description Framework (RDF) which was adopted by the W3C in 1999, (The concepts described in the RDF have become central to the Semantic Web as we'll examine soon.)

In order to realize the vision of the Semantic Web, three requirements need to be satisfied. These requirements are:

There needs to be a way to represent data entities in a standardized, self-describing format that is universally machine-understandable.
There needs to be a standardized way to describe any one of the potentially infinite number of relationships that can exist between any two entities.
There needs to be a standardized way to determine the meaning of metadata applied to a given data entity — for example having a way to determine if the attribute, title describes the name of a book or is a prefix applied the name of a person, such as the case of Moby Dick vs. Doctor.

Let's take a look at the way web technology has evolved to satisfy these requirements.

Implementing a Self-Describing Data Format

The first requirement — describing data entities in a standard, self-describing format — has been satisfied for a while. Today the most common formats for publishing data on the web are XML (as shown in Listing1) and JSON (Listing 2.).

<persons>
    <person id="101" firstName="David" lastName="Bowie" dob="1947-01-08" />
    <person id="102" firstName="Nicholas" lastName="Roeg" dob="1928-08-15" />
    <person id="103" firstName="Rip" lastName="Torn" dob="1931-02-06" />
    <person id="104" firstName="Candy" lastName="Clark" dob="1947-06-20" />
    <person id="105" firstName="Mick" lastName="Jagger" dob="1943-07-23" />
    <person id="106" firstName="Buck" lastName="Henry" dob="1930-12-09" />
</persons>

Listing 1: XML is a standard way to format data for publication on the internet

{
    "persons": [
        {"id": 101, "firstName": "David", "lastName": "Bowie", "dob": "1947-01-08"},
        {"id": 102, "firstName": "Nicholas", "lastName": "Roeg", "dob": "1928-08-15"},
        {"id": 103, "firstName": "Rip", "lastName": "Torn", "dob": "1931-02-06"},
        {"id": 104, "firstName": "Candy", "lastName": "Clark", "dob": "1947-06-20"},
        {"id": 105, "firstName": "James", "lastName": "Dean", "dob": "1931-02-08"},
        {"id": 106, "firstName": "Buck", "lastName": "Henry", "dob": "1930-12-09"}
    ]
}

Listing 2: JSON is also a widely supported way to format data for publication on the internet

Both XML and JSON are considered self-describing standards. In other words, the fields of the data structure are apparent. When you look Listings 1 and 2 above, you have no problem determining that there are attributes; firstName, lastName, and dob. The attribute names are embedded in the data structure. You don't need an external reference to figure things out.

Defining Relationships Between Entities

While XML and JSON are self-describing in terms of attributes, the formats offer a way to describe the relationships in and between entities beyond the simple "has a" relationship. A "has a" is implicit in any standard entity definition. For example, a person has a first name; a person has a last name; a person has a date-of-birth. Also, there are "has some" relationships., For example, "a movie has some actors". The has some relationship implies that one entity is related to a number of other entities in some sort of organizational hierarchy as shown in JSON format in Listing 3 where the movie The Man Who Fell to Earth has some actors.

{
    "id": 4001,
    "title": "The Man Who Fell to Earth",
    "releaseDate": "1976-04-18",
    "director":{"id": 102, "firstName": "Nicholas", "lastName": "Roeg", "dob": "1928-08-15"},
    "actors" : [
        {"id": 101, "firstName": "David", "lastName": "Bowie", "dob": "1947-01-08"},
        {"id": 103, "firstName": "Rip", "lastName": "Torn", "dob": "1931-02-06"},
        {"id": 104, "firstName": "Candy", "lastName": "Clark", "dob": "1947-06-20"},
        {"id": 106, "firstName": "Buck", "lastName": "Henry", "dob": "1930-12-09"}
    ]
}

Listing 3: A "has some" relationship is implicit when an entity has an attribute that is an array of other entities

Defining a "has some" relationship requires nothing more than assigning an array to an attribute of the entity. Thus, it's implicit.

The Problem of Complex Relationships

Where self-description gets tricky is when it comes to describing relationships that are not implicit or when more than one relationship exists between two entities. For example, Nicholas Roeg knows David Bowie and Nicholas Roeg likes David Bowie.

The way the HTML specification attempts to address the problem of complex relationship description is by way of the rel attribute and up until HTML 5, the rev attribute. The rel attribute can be used in <a>, <area> and <link> tags to describe the relationship between the current document and the linked document/resource.

A convention has evolved for using the rel attribute. For example, it's commonly used within the <link> tag to bind a document to a stylesheet, like so:

<link rel="stylesheet" href="main.css" type="text/css" media="screen"/>

Also, the attribute can be used to support an HTML microformats keyword such as home. This example shows the attribute used to describe the destination target of a link as a homepage on the site.

<a href="http://example.com" rel="home">Home</a>

Multiple, space-delimited values can be assigned to the rel attribute. Thus, the attribute can define multiple relationships, as shown in the following example.

<link rel="alternate stylesheet" title="Better Styling" href="better.css" type="text/css"/>

The rel attribute makes it theoretically possible to use web pages to publish data to the Semantic Web. Listing 4 below shows an HTML page that lists connections to the profile of the entity, Nicholas Roeg. Notice that the rel attribute is used to define the variety of relationships that the Nicholas Roeg entity has to the other entities in the unordered list.

  <div>Nicholas Roeg</div>
  <div>1928-08-15</div>
  <div>
    <div>Connections</div>
      <div>
        <ul>
          <li><a rel="knows workedWith likes" href="https://en.wikipedia.org/wiki/David_Bowie">David Bowie</a></li>
          <li><a rel="knows workedWith" href="https://en.wikipedia.org/wiki/Rip_Torn">Rip Torn</a></li>
          <li><a rel="knows workedWith" href="https://en.wikipedia.org/wiki/Candy_Clark">Candy Clark</a></li>
          <li><a rel="knows workedWith likes" href="https://en.wikipedia.org/wiki/Buck_Henry">Buck Henry</a></li>
          <li><a rel="knows workedWith" href="https://en.wikipedia.org/wiki/Mick_Jagger">Mick Jagger</a></li>
          <li><a rel="knows marriedTo" href="https://en.wikipedia.org/wiki/Susan_Stephen">Susan Stephen</a></li>
         <li><a rel="knows workedWith marriedTo" href="https://en.wikipedia.org/wiki/Theresa_Russell">Theresa Russell</a></li>
        </ul>
    </div>
  </div>
</body>
</html>

Listing 4: The HTML rel attribute can be used to describe relationships between a parent document and a linked document.

While it is possible to use HTML to publish semantic data to the web, it's a limited approach in the real world. In terms of machine consumption, the parsing alone is daunting. In addition, when a relationship among any one of the entities changes, the entire HTML needs to be updated. Finally, getting a clear picture of the semantics on a page requires inferring a lot about the information that's marked up. In the example web page shown above, there are a lot of implications in play. For example, there's the implication that Nicholas Roeg is the subject of the web page on which the ordered list is hosted. Also, the ordering of the strings in each item in the unordered list implies first name and last name values. And, the association of the name in the list item to the hyperlink is implied too. A simple change in content, say mistakenly changing the strings from Mick Jagger to Merle Jagger, a well known Western-Rock band in California, will corrupt a significant part of the page's semantics.

As you can see, using the HTML rel attribute in this elementary manner is an obscure and brittle way to publish semantic data to the web. Clearly, a more precise way is needed. Fortunately, the Resource Description Framework (RDF) provided the mechanisms that were needed to move forward.

Applying RDF to Create Well Defined Data

The Resource Description Framework (RDF) is a set specifications defined by the World Wide Web Consortium (W3C) which are intended as a model by which to describe data on the internet. RDF addresses the second and third requirements for publishing data to the Semantic Web as I described at the beginning of this article. RDF provides the standardization necessary to describe any one of the potentially infinite number of relationships that can exist between any two data entities. Also, it provides a standardized way to determine the meaning of metadata applied to a given data entity.

We can use concepts described in RDF to not only enhance the HTML on a web page to be more precise in terms of semantic description but, more importantly, we can apply concepts described in RDF to any data we want to make semantically robust.

The place to start is with the RDF concept of a triple.

Using Triples to Define Relationships

A triple describes the semantic relationship between two things. A triple, as the name implies, is made up of three parts: the subject, the predicate, and the object. The concept is found in all human language. Take the example, "Bob likes fish." The subject is, Bob; the predicate is, likes and the object is, fish. Figure 1, below shows graphical description of the sentence as a triple.

Figure 1: A triple describes a relation in three parts — subject, predicate, and object

Notice that the triple diagram in Figure 1 illustrates two circles connected by a line. In graph mathematics, the circles are called vertices and the line is called an edge. Vertices (aka, nodes) and edges are important concepts that we'll use when we discuss working with data in GraphQL

Triples are useful for describing a number of relationships between two entities. Figure 2, below, shows the entity, David Bowie and the other entities to which he is related. Notice that each of the relationships between entities is clearly described, Also notice that David Bowie has two types of relationships with the song, Heroes. One relationship is that he sings the song. The other relationship is that he composed the song.

As you can see, a triple provides descriptive capabilities that go well beyond the simple "has a" and "has some" relationships found in standard databases.

Figure 2: Using triples captures both entities and relationships

Using triples to capture and describe relationships between entities in a standardized manner provides a way to unify all the data in a meaningful way regardless of the Internet domain. However, more pieces are needed. A triple is well-suited to identify entities and the relationship(s) between entities but they do not describe the meaning of those relationships.

For example, consider the word, know. Does using the word "know," as in Nicholas Roeg knows David Bowie, mean that the extent of Roeg's knowledge of Bowie is what he's read in the newspaper? Or does the word "know" mean that Roeg has known David Bowie from first-hand experience? "Know" can have two different meanings. Clearly, we need a mechanism that describes exactly what the word means when applied to the relationship. This mechanism is called a vocabulary.

Defining Meaning Using Vocabularies

A vocabulary, also known as ontology, is a construct used in the Semantic Web to describe entities and relationships within a particular domain. You can think of a vocabulary as a dictionary that describes the usage of a word or term. For example, as we saw above, using the word, "know" to describe a passing acquaintance as opposed to a long-term associate.

In terms of the Internet in general and HTML and XML documents in particular, a dictionary(s) is defined using an XML namespace that references one or many types of vocabularies such as an XML schema, an RDF schema (RDFS) or Web Ontology Language (OWL).

Listing 5 below shows the Friend of a Friend (foaf) ontology applied to the HTML document we presented earlier.

<html>
<head>
<title>Profile</title>
</head>
<body xmlns:foaf= "http://xmlns.com/foaf/0.1/">
  <div><span about="#Nicholas Roeg" instanceof="foaf:Person" property="foaf:name">Nicholas Roeg</div>
  <div><span about="#Nicholas Roeg" property="foaf:birthday">1928-08-15</span></div>
  <div>
    <div>Connections</div>
    <div>
      <ul>
      <li><a href="https://en.wikipedia.org/wiki/David_Bowie">
          <span about="#David Bowie" instanceof="foaf:Person" property="foaf:name">David Bowie</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#David Bowie"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Rip_Torn">
          <span about="#Rip Torn" instanceof="foaf:Person" property="foaf:name">Rip Torn</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Rip Torn"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Candy_Clark">
          <span about="#Candy Clark" instanceof="foaf:Person" property="foaf:name">Candy Clark</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Candy Clark"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Buck_Henry">
          <span about="#Buck Henry" instanceof="foaf:Person" property="foaf:name">Buck Henry<span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Buck Henry"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Mick_Jagger">
          <span about="#Mick Jagger" instanceof="foaf:Person" property="foaf:name">Mick Jagger</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Mick Jagger"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Susan_Stephen">
          <span about="#Susan Stephen" instanceof="foaf:Person" property="foaf:name">Susan Stephen</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Susan Stephen"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Theresa_Russell">
          <span about="#Theresa Russell" instanceof="foaf:Person" property="foaf:name">Theresa Russell</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Theresa Russell"></span>
      </li>
    </ul>
    </div>
  </div>
</body>
</html>

Listing 5: Applying a Semantic Web vocabulary to a to a web page

Notice that the foaf markup applied to the HTML does not affect the rendering of the document in the browser, nor should it. (See Figure 3). The purpose of using the Friend of a Friend ontology is to provide a way for machines to understand the semantics represented in the HTML.

Figure 3: Applying an XML based ontology to an HTML document does not affect visual rendering

How does a machine interpret the semantics? Take a look at Listing 6, below.

Listing 6: HTML snippet implementing Friend of a Friend (FOAF) vocabulary as XML namespace.

Listing 6: A snippet of HTML that implements the Friend of a Friend (foaf) vocabulary as an XML namespace

Notice how the the XML namespace prefix foaf declared on line 1 of Listing 6 is bound to an ontology on the Internet at http://xmlns.com/foaf/0.1. Notice also the namespace property name which is used at line 3 and the namespace property birthday which is used at line 6. What's going on behind the scenes is that the name and birthday properties, which are part of the foaf namespace, are used to describe data on the web page; in this case the name "Nicholas Roeg" and the birthday "1928-08-15." What's interesting in terms of the Semantic Web is that because the namespace is defined on the Internet (See Figure 4, below), any machines implementing the HTML and consuming the HTML have a common reference point — the ontology at http://xmlns.com/foaf/0.1 — by which the semantics on the web page can be understood.

Publishing an ontology online as a namespaced resource provides a shared semantic definition.

Figure 4: Putting an ontology on the internet as a namespaced resource provides a semantic definition that is common to both publishers and consumers of information

Once the semantics are applied, it's quite possible to program a search algorithm that has the instructions to "go inspect all resources on the Internet that support the ontology defined at, http://xmlns.com/foaf/0.1 and return all entities in which name="Nicholas Roeg" and birthday="1928-08-15".

But, this is only the tip of the iceberg. Remember, we need a way to describe not only simple name-value pairs found in profile data but also the one or many relationships between entities. Again, ontologies solve this problem.

Continue on page 2.

Continued from page 1.

XML schema, RDF schema and OWL support inheritance. Inheritance makes it possible to create an ontology that builds on work done previously in other ontologies. For example the ontology published by the Semantically-Interlinked Online Communities (SIOC, pronounced, shock) extends the Friend of a Friend ontology by publishing properties that describe relations relevant to social media such as Facebook. One such property is "likes".

Listing 8 below is a snippet of the HTML displayed earlier. Only, this time the SIOC ontology is used instead of FOAF. (See line 1.) As mentioned earlier, the SOIC ontology supports the property "likes." Thus, we can use the "likes" property in conjunction with the "knows" property to describe that Nicholas Roeg has two relationships with David Bowie. One relationship is that Roeg likes Bowie (line 11) and the other relationship is the Roeg knows Bowie (line 12).

Listing 7: Applying the SIOC ontology to describe a "likes" relationship

Applying the SOIC ontology to the HTML allows the web page to describe a multitude of relationships to entities on the web page that are machine readable. Listing 9 below shows the result of a semantic analysis of the SOIC ontology that's implicit when applying the ontology to the HTML for the web page we displayed earlier.

resource = http://localhost/#Nicholas Roeg
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#Mick Jagger
- http://rdfs.org/sioc/ns#name = Mick Jagger
http://rdfs.org/sioc/ns#birthday = 1928-08-15
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#David Bowie
- http://rdfs.org/sioc/ns#name = David Bowie
http://rdfs.org/sioc/ns#likes
- resource = http://localhost/#David Bowie
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#Rip Torn
http://rdfs.org/sioc/ns#name = Rip Torn
http://rdfs.org/sioc/ns#likes
- resource = http://localhost/#Theresa Russell
http://rdfs.org/sioc/ns#name = Theresa Russell
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#Susan Stephen
http://rdfs.org/sioc/ns#name = Susan Stephen
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#Buck Henry
- http://rdfs.org/sioc/ns#name = Buck Henry
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#Candy Clark
- http://rdfs.org/sioc/ns#name = Candy Clark
http://rdfs.org/sioc/ns#name = Nicholas Roeg
http://rdfs.org/sioc/ns#knows
- resource = http://localhost/#Theresa Russell

Listing 9: The semantics of the data in the web page according to the SOIC ontology

Listing 10 shows the result of subjecting the web page to a conversion algorithm to describe the semantics on the page as pure XML according to the RDF format (again, RDF being the foundation of the Semantic Web).

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:soic="http://rdfs.org/sioc/ns#">
  <rdf:Description rdf:about="http://localhost#Nicholas Roeg">
    <soic:name>Nicholas Roeg</soic:name>
    <soic:birthday>1928-08-15</soic:birthday>
    <soic:likes rdf:resource="http://localhost#David Bowie"/>
    <soic:likes rdf:resource="http://localhost#Theresa Russell"/>
    <soic:knows rdf:resource="http://localhost#David Bowie"/>
    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Rip Torn">
        <soic:name>Rip Torn</soic:name>
      </rdf:Description>
    </soic:knows>
    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Candy Clark">
        <soic:name>Candy Clark</soic:name>
      </rdf:Description>
    </soic:knows>
    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Buck Henry">
        <soic:name>Buck Henry</soic:name>
      </rdf:Description>
    </soic:knows>
    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Mick Jagger">
        <soic:name>Mick Jagger</soic:name>
      </rdf:Description>
    </soic:knows>
    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Susan Stephen">
        <soic:name>Susan Stephen</soic:name>
      </rdf:Description>
    </soic:knows>
    <soic:knows rdf:resource="http://localhost#Theresa Russell"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://localhost#David Bowie">
    <soic:name>David Bowie</soic:name>
  </rdf:Description>
  <rdf:Description rdf:about="http://localhost#Theresa Russell">
    <soic:name>Theresa Russell</soic:name>
  </rdf:Description>
</rdf:RDF>

Listing 10: The web page semantics translated into pure RDF/XML

And finally, Figure 5 shows a visual graph of the triples implicit in the web page illustrated previously in Figure 3 after the SOIC ontology has been applied. The illustration was created using the RDF validation tool published by the W3C.

Figure 5: Data defined in an RDF dataset expressed in a semantics graph

As you can see, the vision of the Semantic Web and its implementation by way of RDF does indeed offer a way to unify all data published on the Internet as meaningful information. But, you need to know a lot to make it work. In addition to understanding the underlying concepts, you need to have a good degree of mastery in terms of the implementation of ontologies. It's a daunting task that has, for the most part, kept development for the Semantic Web outside of the scope of the commercial mainstream. But, just because companies are slow to adopt techniques compatible with the formal application of RDF does not mean that publishing information in the spirit of the Semantic Web is not taking place. There are commercial frameworks that are designed to support the vision of the Semantic Web. GraphQL is one of the more prominent.

Simplifying the Semantic Web Using GraphQL

The Internet and open source standards have changed the fundamental way that applications access data. In the past, a software application was typically dedicated to a single database. Not only was the database proprietary (for example, Oracle, SQL Server, and IBM DB2), but the connection protocol used to access the database was proprietary as well. Today, it's not unusual for a single application to work with data that resides in a variety of different types of databases. And, more often than not, the application will connect to that data using a variety of open-source protocols such as HTTP, SSH, RTMP and XMPP. In order to promote easy reuse, the trend is to put a generic data access layer between the application and the various data storage technologies. This generic layer is what we have come to know as the Application Program Interface (API).

The Rise of the RESTful API

Using APIs is becoming the standard by which desktop and mobile applications access data on the web. At the time this article was published, the most familiar API architectural style was REST. The premise behind REST is that data exists on the Internet as resources. A REST API publishes data as a resource according to a URI, such as the following example

https://openlibrary.org/api/books?bibkeys=ISBN:0451526538

WHERE

https://openlibrary.org/api/ is the root URI of the API, otherwise known as the API's "endpoint"

https://openlibrary.org/api/books is the location of the resource, in this case books that are known to openlibrary.org

? is the character that separates the the resource URL from the query parameter

bibkeys=ISBN:0451526538 is a query parameter, bibkeys assigned the value, ISBN:0451526538, that indicates a particular book in the library.

The data returned by that call to the URI is shown below, formatted as JSON (vs. XML, RDF, etc.) in Listing 11.

{
    "ISBN:0451526538": {
        "bib_key": "ISBN:0451526538",
        "preview": "noview",
        "thumbnail_url": "https://covers.openlibrary.org/b/id/295577-S.jpg",
        "preview_url": "https://openlibrary.org/books/OL1017798M/The_adventures_of_Tom_Sawyer",
        "info_url": "https://openlibrary.org/books/OL1017798M/The_adventures_of_Tom_Sawyer"
    }
}

Listing 11: A RESTful API publishes data in open formats such as JSON or XML.

REST provides a generic way to access data that's published in open formats like JSON and XML. Thus, it scales well. But there are two drawbacks. REST increases the network traffic between the application and the network and the overall semantics of published data is still obscure in terms of machine readability.

As you can see in Listing 11 above, the JSON returned by a call to the RESTful API has fields that contain string data. The fields bib_key and preview contain simple string data. However the data in fields thumbnail_url, preview_url, and info_url represent URLs that can be called subsequently on another trip back to the network. Hence the implication that in order to get all the information relevant to a particular book, a human or machine must make multiple trips to the network, thus the increased network traffic. Again, this is the first problem.

The second problem is that we still have no well defined understanding as to what the fields mean. For example, does the field, preview contain preview data or does it indicate that a preview is available? When a human looks at the key-value pair "preview": "noview" he or she can infer that the field preview is of type boolean indicating that a preview exists or not and that the value assigned — "noview" — implies a value of false. While a human should intuit this pretty quickly, a machine will be completely baffled. Without an ontological reference, the meaning of both the field and the value assigned to it are unknown. REST APIs do indeed solve a great many problems related to publishing data on the Internet. Yet they fall short in terms of fulfilling the promise of the Semantic Web. Clearly something better is needed. This is where GraphQL comes in.

Using GraphQL to Realize the Semantic Web

Whereas in the past, publishing data to the Semantic Web was an arduous undertaking, one of the goals for GraphQL is to simplify the process. Much the same way triples (return to Figure 1) are the foundation of the Semantic Web, they also serve as the foundation of GraphQL. As you might recall, a triple describes a semantic in three parts, the subject, predicate, and object. For example, as mentioned previously, with the statement Bob likes fish, Bob is the subject, likes is the predicate and fish is the object.

While the GraphQL specification is very exact in the way it describes how objects of a graph (aka, nodes) are to be implemented, there is no description for describing an edge (aka, a predicate). In fact all that's really implied is the basic parent-child, "has-a" relationship (eg: a movie has a director).

The GraphQL specification presently does not support a way to standardize defining predicates. However, a convention has emerged. Over the years, the GraphQL developer community has developed a convention that calls an edge (aka, a predicate) a connectionem>. And, the way a connection is defined is by appending the term Connection as a suffix added to a descriptor (eg: likesConnection, knowsConnection, etc) as shown in Figure 6 below.

Figure 6: Using the Connection suffix is a convention that has emerged in the GraphQL community for identifying edges (aka predicates)

Where it gets tricky is to understand that while the notion of a connection implies a predicate, at the implementation level, a connection is an array of one-to-many nodes that satisfy the implied predicate. For example, a likesConnection is an array of person entities, in which each person is liked by the entity that "owns" the likesConnection array. (See Figure 7.)

Figure 7: The "Connection" naming convention implies the relationship between an entity and an array of entities

Actually, coding to the Connections convention in GraphQL's Schema Definition Language varies among developers. Remember, the only thing the convention really requires is naming the array of entities with the suffix, Connection. While the naming part is easy, things can get complex in the implementation.

The GraphQL development framework Relay includes Connections as part of its specification.

While not part of the general GraphQL specification, connections are supported in the Relay specification. Relay is a Javascript framework for building applications with React. React is the framework that Facebook and other companies use to build user interfaces. Both Relay and React are open source projects published by Facebook.

Take a look at Listing 12 below, which is an example of a GraphQL query named, searchPerson. The query searches for a person according to firstName and lastName parameters. The information that's returned is a paginated collection of entities that have the first and last names specified in the query.

Listing 12: A GraphQL query that declares a likeConnection

Pagination in GraphQL is a complex topic that we discussed previously in Part 3 of this series. So, we won't go into a lot of details about it now, The important thing to understand about Listing 12 above is that in addition to displaying the firstName and lastName of each person returned (lines 7 and 8), the query is configured at line 9 to display a likesConnection for that person too.

As mentioned previously, by convention a Connection is an array of objects that have a particular relationship to the owner. However, things can get confusing when you look at the structure of an object in the likesConnection array. It's not a simple person object. Person information is in there but it's nested down into the node object at Listing 12, line 14. The reason for the nesting it so that the likesConnection can support pagination.

In the real world, on Facebook for example, it's possible for one person to like hundreds, maybe thousands of items. Thus, returning the entire list of items at once is impractical. Instead information is returned in chunks. In the case of the search query in Listing 12, likesConnection information is returned as an array of edges (Listing 12, line 13) in which each edge has a node (Listing 12, line 14) and a cursor (Listing 12, line 18). A cursor is the positional identifier of the particular edge from the overall list of edges contained within the API. The node contains information specific to the person, firstName and lastName, for example.

Figure 8 below shows an excerpt from the result of the searchPerson query when executed in GraphQL Playground against the IMBOB demonstration API that accompanies this series.

Figure 8: A likesConnection returns an edges collection

The reason for this level of complexity around a Connection is to keep the notion of nodes and edges at the forefront of a GraphQL schema design. Remember, the way the Semantic Web defines relationships is by way of the triple. A triple is made up of an edge between two nodes. Thus, a Connection is associated with a single entity, Nicholas Roeg, for example, and that entity can have a number of similar edges with each edge associated to only on other entity. Admittedly, the technique is a bit awkward, but it does support the underlying principle of a triple: subject, predicate, object. Only in terms of the Connection convention, it's subject, predicate(s) and associated object. It takes some getting used to, but it does work. Will a better convention emerge? Time will tell, which leads up to do some frank analysis of how well GraphQL supports the promise of the Semantic Web.

GraphQL and the Semantic Web: Good But Not Complete

In terms of speed, ease of use and flexibility GraphQL offers significant advantages over RESTful APIs. GraphQL supports declarative resultset definition. This means that unlike RESTful APIs in which the structure of a resultset is predefined and immutable when you create a query GraphQL you define the exact structure of the resultset you want returned. You don't incur the overhead of parsing through unwanted response data to get the information you need. GraphQL makes it so it's possible to get only what you need when you need it.

GraphQL supports recursive querying thus cutting down extensively on network traffic. And, the schema of a given API is discoverable to both humans and machines at runtime via GraphQL introspection. These are significant benefits.

Yet, in terms of supporting the Semantic Web, there's still work to be done. While the emerging Connections convention is a useful way to define and determine the edges and nodes within an API in order to create semantic representations between entities, there is an inherent drawback. The Connections convention is simply that, a convention. It's not a standard. Developers can take it or leave it with marginal consequence. To put it another way, the Java community might prefer to see code written according to the camel casing convention, but no compiler is going to barf if it's written otherwise. However, violating the Java syntax standard by putting a curly bracket in the wrong place will bring the compiler to it knees until the mistake is corrected.

Industries tend to support standards. Thus, in order for Connections to become the standard by which semantics are defined with a GraphQL API, and hence become machine discoverable, the convention must become part of the GraphQL specification. Otherwise, it is simply an arbitrary nice-to-have that's popular among developers.

Finally, there is the larger problem of support for authoritative vocabularies. As of now there is no single authority like foaf or SOIC in the GraphQL ecosystem that verifies and publishes semantic vocabularies in a standard manner, on-par with the way W3C's RDF specification is supported under XML. Without an ontological standard, there is no definitive way to distinguish the meanings behind the way different GraphQL APIs use the same term. For instance, the "knows" example we described above. And, more importantly, without a standard way to publish ontologies, the underlying semantics supported by an API are unknowable to a machine, especially one in another domain that may conform to different conventions. A human can "figure it out", a machine can't, although this might change in the future as machine intelligence matures. In order to fulfill the promise of the Semantic Web, the information in a GraphQL API must be discoverable and meaningful to human and machine alike. Until such time, GraphQL is proving to be an effective framework for developers but is not yet a full-fledged participant in the Semantic Web. The technology still has a way to go.

Next Steps: Moving From Theory to Practice

So far in this series we've covered the history that led to the emergence of GraphQL. We took an in-depth look at the specification and presented a demonstration API that shows you how to implement a GraphQL API under Node.js using Apollo Server. In this installment, we looked at GraphQL in terms of the Semantic Web.

In the next and final installment, we'll look at how companies adopted GraphQL in the real world. We'll look at the challenges they encountered and how they addressed them. Also, we'll examine how they benefited from using GraphQL and the lessons they learned.

Be sure to read the next GraphQL article: How Companies are Making GraphQL Work in the Real World