TypeDB by Vaticle

TypeDB (previously Grakn.ai)

Not so long ago the relational database world was the storage foundation of pretty much everything. When the NoSQL (not only SQL) movement started the focus was on schema-less flexibility to power the JavaScript revolution, driven mainly by JSON-like data structures. Graph structures are considered part of NoSQL solutions and although they have always been around it’s only recently that an explosion of backend solutions are on the market. Partly driven by the machine-learning and AI beats, partly by a shift in thinking, graphs are now prominent in software solutions. One has, at the same time, to admit that there is a long, albeit academic, tradition of dealing with conceptual representations of the world via so-called ontologies and triples. Frameworks, theories and software to deal with ontologies have been around for a while and are mainly used in biotech research (especially cancer research) and governments. These tools articulate inter-operability but also come with a hefty dose of theory and quirkiness alien to modern software development. In practice, one has two very different roads paving the world of graphs: the aforementioned triples and property graphs. Some vendors offer hybrid solutions but from an API point of view things are never hybrid, only the underlying storage is. The big names in the context of triples are AllegroGraph, MarkLogic, StarDog as well as the unavoidable (open source) Apache Jena. This world is fairly stable and it’s the property graph world were at lot is happening these days. Every vendor of relational solutions has jumped onto the property graph wagon: Microsoft SQL Server, Redis, Oracle and so on. The main drawback in this direction is what makes the triple-world so seductive: ontologies and inference. Ontologies are effectively schema’s acting as blueprints for (future) data, it’s an abstract representation of a domain. Inference means one can generate insights (intelligence) from data through rules. Rules embody common-sense inside a store. So, one has a tension between two poles: a world of triples and academic thinking (somewhat disconnected from modern software pipelines), a world of swift storage but lacking depth (i.e. inference and ontological thinking). To be totally honest, one should add to this the need for on- and off-premise deployments, security concerns, cloud thinking and scalability. Finding an appropriate graph-like backend is, hence, a conundrum forcing one to either develop custom solutions or drop some requirements altogether.

This is where Grakn comes in, filling a sweet spot in the brave new world of graphs and its ramifications. Based atop existing open source solutions, it boasts a whole lot of features no other vendor has. It effectively harmonizes the tension mentioned above via a query and inference engine sugared with a domain-specific language (a DSL called Graql). Although the underlying storage is based on a genuine property-graph, the DSL guides you in a way familiar to ontologies and embraces inference in a natural fashion. This is no small feat and precious intellectual property. Grakn has developed something innovative and modern in a space where big players seem to struggle to grasp how graphs, concept thinking and AI can work hand in hand. Only Neo4j comes close to this but lacks a lot of goodies at the same time.

As a young and small startup it has the virtue of being agile, fast and bright but (like any other fresh startup) also exhibits all the less of a fresh player.

High momentum means innovation but if one seeks enterprise-level software the maturity can be an issue. This shows in simple things, like for instance a basic recipe for backup/restore. It shows in not having proper benchmarks or a clear description of free/paid features on the website.

From an architectural point of view, the dependency on JanusGraph and Cassandra is a good choice but not fully exploited. The pluggable nature of JanusGraph allows one to tap into Spark and Hadoop for machine learning, to replace Cassandra with other Gremlin-compliant stores, to scale things as needed. This opens up opportunities which are not expressed anywhere in Grakn unless one digs into the source code. It’s probably a strategic choice but from our angle an avenue with much potential. Spark has become the de-facto choice for big data machine learning and many companies have custom pipelines and data lakes connected to it. Being able to integrate Grakn in this way is an underrated bonus.

On an API and DSL level one can see that much thinking has gone into developing something unique and practical. Certain dangerous boundaries have been explored (e.g. potentially endless cascading inference rules) and the DSL appropriately tuned to avoid pitfalls. Much like SPARQL and Cypher are open standards it would be a good move to lift the Graql up to a level where it can be picked up by the community independently of the underlying storage engine. Extending the language with custom constraints (say, date-time clauses) should be ‘easy’ and not demand a full checkout and build. Apache Jena and their Java-based SDK to extend SPARQL comes to mind.

Finally, perception is all these days. From a marketing point of view the Grakn website is not tuned well and answers to standard questions are somewhat indirect: what is the enterprise offering, pricing model, what about consultancy, licensing, what is the deployment model and how does it scale, encryption and security? It sits somewhere between a very promising open source product and a company aiming for acquisition and/or investors.

All in all a very promising product in a crowded market with unique features. Not enterprise ready yet but could be in a year and provided it grows in the right way (or with the right guidance) a company well worth keeping an eye on.

TypeDB (previously Grakn.ai)

Journal