Memgraph
Memgraph is a very promising product and I sincerely hope it will mature in the coming years to become an enterprise-ready alternative to Neo4j. I must admit, I really like it. It’s not battle-ready and much in flux but if the Memgraph guys manage to elevate it, it will be stellar.
Before highlighting the objective pro and cons, let me mention the things that I find seducing:
- they have an amazing collection of articles, not just about Memgraph but about lots of tangential topics.
- their Discord channel is quite active and the Memgraph folks are highly responsive
- the strategic choices are just right: the technology, the forward-looking features, the focus on developer features and Cypher as a query language.
Memgraph is also a lot of fun, it invites one to explore and to dive into all sorts of graph applications.
That said, I would not advise any customer to use it. Not yet. After using it for quite a while and in various ways, it’s clear that it will take a few years more to become a good alternative to Neo4j and to effectively do what the documentation mentions. For example, there are tons of articles on all sorts of graph analytic topics but many functions (e.g. cycle detection and other Mage methods) are not properly working. Things also often seem to break down when the graph becomes larger than a few thousand nodes. I did try out, for instance, to analyse the Hetionet dataset (around 2M edges) but it failed miserably. The Memgraph people are helpful but it’s clear it all requires some more TLC.
The same goes for the hosted version, Memgraph Cloud. Various things fail, connections drop randomly and I would not suggest any customer to use it for their running business. Again, I do believe it’s going in the right direction but it ain’t there yet.
On a more factual level, these are the things you need to consider when evaluating Memgraph:
Pro
- Both MemGraph and Mage can run on GPU, it makes use of cuGraph (part of RapidsAI). This is a huge plus in comparison with Neo4j, which runs only ML stuff on CPU.
- Mage sits outside the database, way better than the way Neo4j does ML inside the database. The support for Python out of the box is also a huge advantage if you are into graph machine learning.
- Really good and extensive documentation.
- GQLAlchemy is an Object-Graph Mapper (OGM) which also works with Neo4j.
- Fast, in-memory with transactions written to disk.
- Custom procedure in Python, rather than Java. C++ is also supported.
- Written in C++, very fast in-memory processing
- Because the query language can be via Python it made it possible to integrates NetworkX. You can use NetworkX functionality and in this way extends the library to a graph database rather than just for small graphs
- Streaming graph analytics
- Clustering
- Triggers
- Drivers for all main languages
- Apache Kafka ready
- Security roles
- LDAP integration
- Cloud offering
- Open source (550 stars)
Con
- Single database, one graph. Not multi-tenant and if you want another graph you need to dump the current one.
- In-memory means at any instant you have a very fast graph but it’s just one single graph
- No schema support. No ontology enforced. In this respect, TigerGraph remains pretty much the only (property graph) database doing it right.
- Linux only with MacOS and Windows via Docker.
- Low adoption but growing.
- Small but strong community support
- Limited query profiling
- Missing enterprise breadth and technical solutions (like CDC)
None of the disadvantages are necessarily an issue but the fact that listed features are not always working or fail on (not very) large graphs is something to take serious.
MemGraph is a fantastic graph database with a bright future, it implements OpenCypher and can be an in-place replacement for Neo4j if you are willing to overlook some growing pains. It’s a typical open source (C with serious investors (Microsoft among others) and comes with streaming analytics and heaps of really nice documentation.
The biggest drawback (at this point in time, at least) is the one-database limitation. It holds the graph in memory making it very fast and enabling streaming analytics. Memgraph continuously backs up data to disk with transaction logs and periodic snapshots. On restart, it uses the snapshot and log files to recover its state to what it was before shutting down. So, unlike RedisGraph for instance, in-memory does not mean the data is volatile.
Besides speed, the Python/C++ stack means that writing custom procedures can be written in Python and NetworkX is part of the query language. This is huge productivity gain and appeals to data scientists. Neo4j does not run on GPU and implements its machine learning within the database. Both points render Neo4j weak in a data science context. MemGraph, on the other hand, does run on GPU and its Mage extensions runs outside the database.