Knowledge representation and reasoning (KR) is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can utilize to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation aka semantics incorporates findings from psychology about how humans solve problems and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning, such as the application of rules or the relations of sets and subsets.

If you want to play with SPARQL and triples you will find that you end up with a few options: Apache Jena (Fuseki), Stardog and BrightstarDB. It seems that BrighstarDB is not active anymore and the free version of Stardog is limited to millions of triples. In one of our projects we used Fuseki for a POC and found it to be OK until we hit various incomprehensible problems and loss of data. To be clear: do not use Apache Jena beyond a POC and simple setups. The problem is that you will not find any other SPARQL open source alternative and, hence, forced to buy expensive licenses.

Considering that Jena is relatively old and widely known one would think that it’s battle resistant and though it does not boast enterprise features (e.g. clustering) it should be good to go. Not. For instance:

  • we found that concurrent reading and writing of triples would bring the service to a halt with as little as three users
  • reading after writing sometimes requires one to insert a delay in order to let Jena digest the changes
  • deleted databases persist on disk while not visible in Jena
  • data sometimes disappears for no reason

and one cannot but systematically distrust the service. Some of the issues likely have their origin in Fuseki (the REST service on top of Jena) but others definitely are deep inside Jena. This situation adds to the problematic acceptance of semantic thinking in the industry. Ontologies and related tools (like Stanford’s Protégé) still very much feel like academic inventions and although there is a growing interest it’s still far away from the relational standard. A bit like the R language for statistical analysis; massive value but quirky at its core.

So, using open source triple stores for real-world applications? Not yet. If your triple count is in the trillions you will have to go for MarkLogic and alike. Which is really a shame since there are so many great open source or free solutions in the relational world.

On knowledge representation through ontology logs and how it’s related to category theory.

A multi-part series on knowledge graph modeling aka semantics and related technologies.

A multi-part series on knowledge graph modeling aka semantics and related technologies.

A multi-part series on knowledge graph modeling aka semantics and related technologies.

A multi-part series on knowledge graph modeling aka semantics and related technologies.

SPARQL is pronounced ‘sparkle’ and stands for Semantic Protocol and RDF Query Language. It looks similar to SQL but also totally different since it has to encompass the non-existing field and table names, yet include named graphs and links attached to links.

A multi-part series on knowledge graph modeling, semantics and related technologies.