Graph Databases and Big Data

Graph stores can be divided in property graphs, triples stores and hybrid (or exotic) systems. Each flavor has its pros and cons, all depends on your business context and what you wish to achieve. One could also separate graph stores in function of basic and enterprise features. In contrast with the relational stores there is indeed a much broader spectrum of ‘quality’. Things like encryption, clustering, high-availability and such do not show up in all systems. The triple world, in particular, can be more academic than enterprise minded. This is also reflected in prices and licensing.

As such, picking the right solution for your project can be a challenge on its own. The diversity and incoherence can be overwhelming. We have gone through this process over the years time and again, if you need guidance we’re here to help.

 

Property Graphs

Property graphs have the characteristic that the nodes (and edges) carry a payload, usually in the shape of some JSON. Some implementations enforce a schema, sone don’t, some partially. In the context of graph learning one speak of heterogeneous and homogeneous graphs, depending on whether the schema across the graph elements is  uniform or not. Property graphs have much in common with other NoSQL systems (like MongoDB) but there is a much broader range of API flavors, graph query languages and enterprise readiness.

 

Neo4j: The Internet-Scale Graph Platform

Neo4j’s Graph Platform is bringing a connections-first approach to applications and analytics across the enterprise.

It has become the standard go-to solution for many companies when dealing with graph-like data.

 

 

JanusGraph: Distributed, open source, massively scalable graph database

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

JanusGraph is a project under The Linux Foundation, and includes participants from Expero, Google, GRAKN.AI, Hortonworks, IBM and Amazon.

 

The Only Scalable Graph Database for The Enterprise

Through its Native Parallel Graph technology, the TigerGraph platform represents what’s next in the graph database evolution: a complete, distributed, parallel graph computing platform supporting web-scale data analytics in real-time.

Combining the best ideas (MapReduce, Massively Parallel Processing, and fast data compression/decompression) with fresh development, TigerGraph delivers what you’ve been waiting for: the speed, scalability, and deep exploration/querying capability to extract more business value from your data.

 

Spark GraphFrames

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

 

Grakn: the intelligent knowledge graph.

Grakn is the knowledge graph engine to organise complex networks of data and making it queryable, by performing knowledge engineering. Rooted in Knowledge Representation and Automated Reasoning, Grakn provides the knowledge foundation for cognitive and intelligent (e.g. AI) systems, by providing an intelligent language for modelling, transactions and analytics. Being a distributed database, Grakn is designed to scale over a network of computers through partitioning and replication.

 

AnzoGraph DB: Take on new analytical challenges with this market-leading graph analytics database.

AnzoGraph DB is a Massively Parallel Processing (MPP) native graph database built for analytics at scale (trillions of triples and more), speed and deep link insights. Use it for embedded analytics that require graph algorithms, graph views, named queries, aggregates, built-in data science functions, data warehouse-style BI and reporting functions.

 

RedisGraph – a graph database module for Redis

RedisGraph is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph.

  • Based on the Property Graph Model
  • Nodes (vertices) and Relationships (edges) that may have attributes
  • Nodes that can be labeled
  • Relationships have a relationship type
  • Graphs represented as sparse adjacency matrices
  • Cypher as query language
  • Cypher queries translated into linear algebra expressions

Distributed Joins, Streaming Transactions, extended GraphDB & Search Capabilities

By uniting graph, document, and key/value in a single core with the same query language, along with a full-text search and ranking engine, ArangoDB provides the flexibility to easily apply the data models you need.

Triple Stores and Semantic Databases

Triples stores are very different compared to property graphs; nodes and edges don’t carry data. Rather, everything sits in ‘triples’. Triples are basic arrows and this gives at the same time a lot of flexibility and a very different way of thinking. SPARQL, the semantic query language, is uniformly implemented across all vendors but each adds to it in the shape of custom functions. 

The world of triples is more linked to academic research and governmental projects compared to the more business-minded world of property graphs. At the same time, many semantic stores profile themselves as ‘universal’ in the sense that they (try) to embrace the diversity of data sources found in a typical enterprise.

 

Stardog: enterprise data unification.

A knowledge graph is only as powerful as the data it can access, so we’ve built database connectors to make it easier to unify your data. Stardog has connectors for all major SQL systems and the most popular NoSQL databases. In addition, we built the BITES pipeline to process your unstructured data like research papers, resumes, and regulatory documents. BITES uses NLP entity recognition to identify and extract concepts, adding those data relationships into the knowledge graph. Once your data is unified in Stardog, you can see a 360-degree view on each data point.

 

Data integration. Simplified.

Ingest data into MarkLogic as is, without worrying about predefined schemas and complex ETL. MarkLogic’s flexible, multi-model approach lets you bring in data from anywhere—relational databases, mainframes, fileservers, Hadoop—or any other source. It’s that easy.

 

Data-driven agility without compromise

Virtuoso is a revolutionary, next generation, high-performance virtual database engine for the Distributed Computing Age. It is a core universal data access technology set to accelerate our advances into the emerging Information Age.

Virtuoso provides transparent access to your existing data sources, which are typically databases from different database vendors.

Through a single connection, Virtuoso will simultaneously connect your ODBC, JDBC, UDBC, OLE-DB client applications and services to data within Oracle, Microsoft SQL Server, DB/2, Informix, Progress, CA-Ingres and other ODBC compliant database engines. All your databases are treated as single logical unit.

 

Semantic Technologies for Smarter Information Retrieval and Content Management

GraphDB is an enterprise ready Semantic Graph Database, compliant with W3C Standards. Semantic graph databases (also called RDF triplestores) provide the core infrastructure for solutions where modelling agility, data integration, relationship exploration and cross-enterprise data publishing and consumption are important.

 

Apache Jena: a free and open source Java framework for building Semantic Web and Linked Data applications.

Apache Fuseki is the web-enabled (REST) layer on top of the Jena framework. If you don’t need high-level enterprise features then this is your best bet. 

 

Blazegraph is ultra-scalable, high-performance graph database with support for the Blueprints and RDF/SPARQL APIs. 

Graphs are a powerful, flexible means of representing all kinds of linked data.    Graph application show up everywhere from Knowledge Graphs, to Community Detection and Clustering, to Failure Detection in the Internet of Things (IoT), and Genomics / Biology for drug discovery and precision medicine to cyber security and national defense. Blazegraph is high performance graph database platform that provides supports RDF/SPARQL APIs and the Apache TinkerPop™ stack with scalable solutions.

 

Database is built with RDF support, including multiple linked data formats such as NQuads and JSON-LD.

Open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.

 

Fast, reliable graph database built for the cloud

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune supports popular graph models Property Graph and W3C’s RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Hybrid and Exotic Systems

In this category one finds relational systems which implemented a graph API on top of the underlying tabular structure. This has good and bad. Microsoft and Oracle at the forefront but also MySQL and PostgreSQL have graph-layers. Typically you will find Gremlin as a query language but some, like Microsoft, managed to integrate the specifics of graph queries inside the SQL language.

 

Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service for managing data at planet-scale.

Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service “for managing data at planet-scale” launched in May 2017. It is schema-agnostic, horizontally scalable and generally classified as a NoSQL database.

 

Graph processing with SQL Server and Azure SQL Database

SQL Server offers graph database capabilities to model many-to-many relationships. The graph relationships are integrated into Transact-SQL and receive the benefits of using SQL Server as the foundational database management system.

These organizations are also known to offer solutions

The graph store landscape evolves rapidly and the diversity, compared to RDBMS, is staggering. There is something for everyone, it all depends on your business aims and budget.

  • Algebraix

  • Amazon

  • Amisa Server

  • BrightstarDB

  • Cayley

  • CubicWeb

  • Dydra

  • FlockDB

  • Giraph

  • GlobalsDB

  • GraphBase

  • Hama

  • HyperGraphDB

  • InfoGrid

  • JanusGraph

  • Lexis Nexis

  • MarkLogic

  • Mulgara

  • Oracle

  • Pregel

  • Redland

  • RedStore

  • SparkSee

  • SparqlDB

  • Sqrrl

  • Strabon

  • Teradata

  • VelocityGraph

  • Virtuoso