AN INTRODUCTION TO KNOWLEDGE REPRESENTATION

(This is a multi-part series on semantics and reasoning)

An overview of how dotnetRDF works in-memory and with a semantic store.

#r "dotNetRDF"
#r "Microsoft.VisualStudio.TestPlatform.TestFramework"
#r "Microsoft.VisualStudio.TestPlatform.TestFramework.Extensions"
using System;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using VDS.RDF;
using VDS.RDF.Parsing;
using VDS.RDF.Query;
using VDS.RDF.Query.Inference;
using VDS.RDF.Storage;
using VDS.RDF.Writing;
using VDS.RDF.Writing.Formatting;
using System.IO;

This formatter is used below to display things in a more readable fashion.

public class SimpleFormatter : INodeFormatter
{
public string Format(INode n)
{
var raw = n.ToString();
raw = raw.Substring(raw.LastIndexOf("/") + 1);
return raw.IndexOf("#") > -1 ? raw.Substring(raw.LastIndexOf("#") + 1) : raw;
}

public string Format(INode n, TripleSegment? segment)
{
throw new NotImplementedException();
}
}
var formatter = new SimpleFormatter();

In-memory graphs

Creating graphs and triples is as easy as you’d think it is:

Func CreateLittleGraph = () => {
// create a graph
IGraph g = new Graph();
// this means that if we store it, it will be found
// as part of this namespace
var pandoraNamespace = "http://www.pandoraintelligence.com";
g.BaseUri = new Uri(pandoraNamespace);
// you further organize nodes in subspaces (cfr. node types
g.NamespaceMap.AddNamespace("person", UriFactory.Create($"{pandoraNamespace}/person/"));
g.NamespaceMap.AddNamespace("relation", UriFactory.Create($"{pandoraNamespace}/relation/"));

var swa = g.CreateUriNode("person:Swa");
var peter = g.CreateUriNode("person:Peter");
var dave = g.CreateUriNode("person:Dave");
var workswith = g.CreateUriNode("relation:works_with");

// an assert is the same as addint stuff
g.Assert(new Triple(swa, workswith, peter));
g.Assert(new Triple(swa, workswith, dave));
return g;
};

var g = CreateLittleGraph();

// the current in-memory graph consists of triples
foreach (var t in g.Triples)
{
Console.WriteLine(t.ToString());
}
Assert.AreEqual(2, g.Triples.Count);

There are various formats around to deal with graphs and triples. The package allows to use pretty much all of them.

For example, the N-triples format is basically a CSV format where each line is subject-predicate-object triple:

var ntwriter = new NTriplesWriter();
var path = Path.Combine(Directory.GetCurrentDirectory(),"WorkingTogether.nt");
ntwriter.Save(g,path);

If you open the file you will see something like the following:

. .

The Turtle format (aka* terse RDF triple*) is a superset of NTriples and would give something like:

person:Swa relation:works_with person:Dave, person:Peter.

The RDF format is the most verbose one and is pure XML.

var rdfw = new RdfXmlWriter();
var path = Path.Combine(Directory.GetCurrentDirectory(),"WorkingTogether.rdf");
rdfw.Save(g, path);

The IGraph thing has of course the features you would expect.

To get all the nodes:

Func justTheName = (INode n)=>{return n.ToString().Substring(n.ToString().LastIndexOf("/")+1);};
g.Nodes.Select(n=>justTheName(n));

To get the links emanating from the Swa-node:

var childrenLinks = g.GetTriplesWithSubject(g.Nodes.First());

One can use SPARQL on an in-memory graph:

var query = @"PREFIX relation: 
SELECT ?person
WHERE
{ ?s relation:works_with ?person . }";
var results = g.ExecuteQuery(query) as SparqlResultSet;
foreach (var result in results.Results)
{
Console.WriteLine(result.ToString(formatter));
}

The dotnetRDF package also comes with a set of SPARQL extensions, for example calculating the hash of a star-graph:

var query = @"PREFIX lfn: 
SELECT lfn:md5hash(STR(?s)) AS ?SubjectHash WHERE {?s ?p ?o} GROUP BY ?s";
var results = g.ExecuteQuery(query) as SparqlResultSet;
foreach (var result in results.Results)
{
Console.WriteLine(result.ToString());
}

which is quite a strong feature. This can be compared, to some extend, to a LINQ query against an SQL result set.

Using Wikidata

This section shows how one typically uses linked data via LINQ and SPARQL. The whole API is really easy and the only thing that you need to inprint in your mind is that a triple is organized as subject-predicate-object.

⚠ Triple \<=> [Subject, Predicate, Object]

So, when using the API to filter out some nodes you first filter the triples and then use the triple SPO structure to fetch one of the nodes or the link.

There is complete data dump in RDF format but to experiment here we’ll use this sample (in N-triples format) which consists of around 1000 triples.

var wikiGraph = new Graph();
var path = Path.Combine(Directory.GetCurrentDirectory(),"sample-wikidata-terms.nt");
wikiGraph.LoadFromFile(path);
foreach(var node in wikiGraph.Nodes.Take(10)){
Console.WriteLine(justTheName(node));
}

Looking at the entities you can see that the information is not contained in the entity itself but everything is linked. The neighborhood of a node is its identity.

var query = @"select distinct ?s
where
{ ?s ?p ?o .
FILTER (STRSTARTS(str(?s), 'http://www.wikidata.org/entity/'))
}";

var results = wikiGraph.ExecuteQuery(query) as SparqlResultSet;
foreach (var result in results.Results.Take(10))
{
Console.WriteLine(result.ToString(formatter));
}

Most of the nodes are informational nodes attached to an entity. For example, the Q1 node has 297 related nodes.

var q1 = wikiGraph.GetUriNode(UriFactory.Create("http://www.wikidata.org/entity/Q1"));
wikiGraph.GetTriplesWithSubject(q1).Count();

In these 297 nodes there is a lot of redundancy in the shape of localization. There are indeed only three different links:

wikiGraph.GetTriplesWithSubject(q1).Select(t=>t.Predicate.ToString()).Distinct()

and if we look at the description for example we get the description of the entity in 61 languages:

var description = wikiGraph.GetUriNode(UriFactory.Create("http://schema.org/description"));
wikiGraph.GetTriplesWithSubjectPredicate(q1,description).Select(t=>t.Object.ToString());

How can one link the Wikidata with our own data? Here things different with the backend story. If you have a backend store you can simply save a triple linking two nodes. The in-memory requires to merge the two graphs (i.e. g and wikiGraph). It’s possible to add a triple to either graphs but this will not create a link between the two graphs since they exist completely separate in memory. The optional boolean parameter when merging allows you to keep the two separate namespaces or to unify it.

Note that the dotnetRDF documentation is overall very good.

The following merges the wiki graph with our own little graph, resulting in having (2 + 1000) triples:

g.Merge(wikiGraph, true);
g.Triples.Count()

Now you can link things together. Here the node ‘Peter’ is linked to the Wikidata entity ‘universe’ with label ‘belongs_to’:

var peter = g.GetUriNode(UriFactory.Create("http://www.pandoraintelligence.com/person/Peter"));
var universe = g.GetUriNode(UriFactory.Create("http://www.wikidata.org/entity/Q1")); // see above
var belongsTo = g.CreateUriNode("relation:belongs_to");
g.Assert(peter, belongsTo, universe);

You can see that this info is now present by looking at the nodes linked to Peter:

g.GetTriplesWithSubject(peter);

If you wonder why the triple between Swa and Peter is not there: it’s because that triple looks like [Swa, works_with, Peter] and is hence not part of the triples with subject Peter.

Finally, if you have a backend you can simply save this graph and the novel information will be persisted. Everything remains unique up to URI.

With a backend

The rest of this document shows how to deal with graphs and data when there is linked data server. All of the servers agree on the various formats (Turtle, RDF…) in use and they also all agree on the SPARQL query language. So, all of what can be done with dotnetRDF actually works with all of the servers (Jena, BrightstarDB, Virtuoso….).

⚠ You need a Stardog server for the rest of this document.

var store = new StardogConnector("http://localhost:5820", "simple", "admin", "admin");

If you would use AllegroGraph you would use similarly

var store = new AllegroConnector("http://localhost:10035", "simple", "admin", "admin");

The createLittleGraph method creates a simple graph. You can save it to the backend simply like this:

var g = CreateLittleGraph();
store.SaveGraph(g);

Fetching a graph based on its URI is just as simple:

var fetchedGraph = new Graph();
store.LoadGraph(fetchedGraph, new Uri("http://www.pandoraintelligence.com/"));
foreach (Triple t in g.Triples)
{
Console.WriteLine(t.ToString());
}

Fetching a graph in this way is called a named graph and you can also do it via SPARQL:

var q = @"select distinct ?g where {
graph ?g {
?s ?p ?o
}
}";
var results = store.Query(q) as SparqlResultSet;
foreach (SparqlResult result in results)
{
Console.WriteLine(result.ToString());
}

To fetch all subject-object couples you can use for example:

var q = @"select distinct ?s ?o where {
graph ?g {
?s ?p ?o
}
}";
var results = store.Query(q) as SparqlResultSet;
foreach (SparqlResult result in results)
{
Console.WriteLine(result.ToString(formatter));
}

If we now repeat the process explained in the Wikidata section and save the result into the backend:

var wikiGraph = new Graph();
var path = Path.Combine(Directory.GetCurrentDirectory(),"sample-wikidata-terms.nt");
wikiGraph.LoadFromFile(path);
g.Merge(wikiGraph, true);
store.SaveGraph(g);

Note that you can also explore things in the Stardog management:

![IMAGE](quiver-image-url/8FFCC87DE8471698C600A14757F2FF2A.jpg =828×776)

Now we’ll show that one can create information in the store by saving triples.\
Assuming that you know the URI of Peter and the universe node (e.g. via a SPARQL query) you can proceed like so:

var workingGraph = new Graph();
workingGraph.NamespaceMap.AddNamespace("person", UriFactory.Create("http://www.pandoraintelligence.com/person/"));
workingGraph.NamespaceMap.AddNamespace("relation", UriFactory.Create("http://www.pandoraintelligence.com/relation/"));
workingGraph.NamespaceMap.AddNamespace("wikidata", UriFactory.Create("http://www.wikidata.org/entity/"));
var peter = workingGraph.CreateUriNode("person:Peter");
var belongsTo = workingGraph.CreateUriNode("relation:belongs_to");
var universe = workingGraph.CreateUriNode("wikidata:Q1");
workingGraph.Assert(peter, belongsTo, universe);
workingGraph.Triples.ToArray().First();

Save this to the backend and ensure yourself that this information is not (uniquely) there. That is, you can create a graph in memory and be sure that the nodes in the backend are used correctly based on their URI. A different URI is a different node.

store.SaveGraph(workingGraph);
var query = @"
PREFIX relation: 
select * where {
?s relation:belongs_to ?o .
}";
var results = store.Query(query) as SparqlResultSet;
foreach (SparqlResult result in results.Take(10))
{
Console.WriteLine(result.ToString(formatter));
}

Inference in-memory

Inference can occur in-memory or in the backend. In both cases the inference happens by means of rules defined through some RDF (or any related format). A simple example is the hierarchy \
\
Vehicle – Car – SportsCar

which can be decribed in turtle format like this:

:Vehicle a rdfs:Class . \
:Car rdfs:subClassOf :Vehicle . \
:SportsCar rdfs:subClassOf :Car .

Below you can see how without the additional schema info the SportsCar fails to be recognized as a Car (through simple transitive inference):

//First we want to load our data and schema into Graphs
Graph data = new Graph();
FileLoader.Load(data, "CarData.ttl");
Graph schema = new Graph();
FileLoader.Load(schema, "CarSchema.ttl");

//Now we ask for things which are cars from our data Graph
IUriNode rdfType = data.CreateUriNode(new Uri(RdfSpecsHelper.RdfType));
IUriNode car = data.CreateUriNode("ex:Car");
Console.WriteLine("Without inference:\n");
foreach (Triple t in data.GetTriplesWithPredicateObject(rdfType, car))
{
Console.WriteLine(t.ToString());
}
//This will result in the triples defining the type for :FordFiesta
//and :AudiA8 being printed
//BUT without inference we don't know that :FerrariEnzo is a car

//So now we'll go ahead and apply inference
StaticRdfsReasoner reasoner = new StaticRdfsReasoner();
reasoner.Initialise(schema);
reasoner.Apply(data);
Console.WriteLine("\nWith inference:\n");
//Now we ask for things which are cars again
foreach (Triple t in data.GetTriplesWithPredicateObject(rdfType, car))
{
Console.WriteLine(t.ToString());
}
//This time it will have printed :FerrariEnzo as well as it will have inferred that anything
//which is of type ex:SportsCar is also of type ex:Car

Inference in the backend

Inference can also happen in the backend on the fly. That is, the inferred insights can be fetched on/off when sending a SPARQL query.

To show it in action we’ll use a simple gender inference based on a Facebook profile node with gender info.

![IMAGE](quiver-image-url/EB7ED51664BB77208621EDA3788EEA5D.jpg =582×433)

In the ontology we have the class ‘men’ being a sub-class of ‘person’ but we donnot assign this class to the individual ‘Swa’. The inference is however such that if any person has a Facebook profile and this profile has gender info ‘male’ the person will be inferred to be in the class ‘men’.

![IMAGE](quiver-image-url/253FF154BEBC834A1007B79C8B03E616.jpg =349×231)

The way this is done in RDF/Turtle is not very difficult and is easily done in the Protégé UI:

![IMAGE](quiver-image-url/1686B2EAFA9F1B2314AF450961F7D740.jpg =509×233)

The ontology is in a file called ‘Inference.owl’:

var g = new Graph();
FileLoader.Load(g, "Inference.owl");
store.SaveGraph(g);

Below we ask via SPARQL the things related to the node ‘Swa’. The inference is switched on/off with a simple boolean parameter when the query is sent:

var store = new StardogConnector("http://localhost:5820", "simple", "admin", "admin");

var q = @"select ( as ?s) ?p ?o
{
{  ?p ?o }
union
{ graph ?g {  ?p ?o } }
}";
// without inference
var infer = false;
var results = store.Query(q, infer) as SparqlResultSet;
if (results == null)
{
Assert.Fail("No triples returned.");
}
Console.WriteLine("Without inference:\n");
foreach (var result in results)
{
Console.WriteLine(result.ToString(formatter));
}
Assert.AreEqual(3, results.Count);

infer = true;

results = store.Query(q, infer) as SparqlResultSet;
if (results == null)
{
Assert.Fail("No triples returned.");
}
Console.WriteLine("\nWith inference the person subtype 'men' is inferred:\n");
foreach (var result in results)
{
Console.WriteLine(result.ToString(formatter));
}
Assert.AreEqual(4, results.Count);

Custom inference

In the backend there are various inference engines. This is the case for all vendors and every inference engine has its strengths. It all depends what you look for.

From the point of view of dotnetRDF there are various engines as well, all implementing the IInferenceEngine interface. This interface is really not magical: you get a graph and you can do whatever you like with it.

In the implementation below, the Swa-node is looked for and then adorned with a tag if found:

public class AdornSwaNode : IInferenceEngine
{
///

/// This method can alter the given graph
///

///public void Apply(IGraph g)
{
// the Swa-node sits in the given graph we add custom info to it
var found = g.GetUriNode(UriFactory.Create("http://www.pandoraintelligence.com/Swa"));
if (found != null)
{
// add a custom namespace as an example
// you can also use the existing ones
g.NamespaceMap.AddNamespace("tags", UriFactory.Create($"{g.BaseUri}tags"));
g.NamespaceMap.AddNamespace("default", UriFactory.Create($"{g.BaseUri}"));
// we will create this: Swa-(has_tag)->tags:funky
var tag = g.CreateUriNode("tags:/Funky");
var link = g.CreateUriNode("default:has_tag");
g.Assert(found, link, tag);
}
// we donnot change the graph if the Swa-node is not there
}

public void Apply(IGraph input, IGraph output)
{
throw new NotImplementedException();
}

public void Initialise(IGraph g)
{
throw new NotImplementedException();
}
}

To use this engine, fetch the named graph we created before and apply the engine:

var store = new StardogConnector("http://localhost:5820", "simple", "admin", "admin");
var g = new Graph();
// the ontology we save above is one graph with name 'http://www.pandoraintelligence.com'
store.LoadGraph(g, "http://www.pandoraintelligence.com");
// apply the custom inference
var adorner = new AdornSwaNode();
adorner.Apply(g);

// checking that the adorner added new info to the Swa-node
Func isSwa = node => (node is UriNode) && (node as UriNode).Uri.ToString() == "http://www.pandoraintelligence.com/Swa";
Func isFunky = node => (node is UriNode) && (node as UriNode).Uri.ToString() == "http://www.pandoraintelligence.com/tags/Funky";

var found = g.Triples.Count(t => isSwa(t.Subject) && isFunky(t.Object));

Assert.IsTrue(found == 1);

Now, it’s clear that one does not really need the interface. You can simply fetch a graph, manipulate it and then save it again as well.