Frequently asked questions

From Dgraph Wiki
Jump to: navigation, search

General

What is Dgraph?

Dgraph is a distributed, low-latency, high throughput graph database, written in Go. It puts a lot of emphasis on good design, concurrency and minimizing network calls required to execute a query in a distributed environment.

Why build Dgraph?

We think graph databases are currently second class citizens. They are not considered mature enough to be run as the sole database, and get run alongside other SQL/NoSQL databases. Also, we're not happy with the design decisions of existing graph databases, which are either non-native or non-distributed, don't manage underlying data or suffer from performance issues.

Why would I use Dgraph?

If you're interested in a high-performance graph database with a lot of emphasis on great design and resilience, and you're not afraid to experiment with cutting edge technologies, you should consider Dgraph. Dgraph at this stage is ideal for internal non-user facing projects.

If you're running more than five tables in MySQL database with more than five foreign ids, your data might be better served via a graph database. If you're running on NoSQL database (Mongo or Cassandra) having to do joins in the application layer, you should be using a graph database instead.

Why would I not use Dgraph?

If you're looking for a stable, mature database, Dgraph wouldn't be the right fit for you. It is at an early stage, where a lot of functionality is still being worked on, and releases might not be backward compatible.

Another thing is, if your data doesn't have graph structure, i.e., there's only one predicate, then any graph database might not be a good fit for you. A NoSQL datastore is best for key-value type storage.

Is Dgraph production ready?

We recommend Dgraph to be used in internal projects at companies. Minor releases at this stage might not be backward compatible; so we highly recommend using frequent backups.

Is Dgraph fast?

Every other graph system that I've run it against, Dgraph has been at least a 10x factor faster. It only goes up from there. But, that's my own observations. We have a couple of folks doing a thorough benchmarking of Dgraph against Cayley and Neo4J, which we'll publish blog posts for soon. See relevant Github issues here[1].

Internals

Why does Dgraph use RocksDB?

RocksDB, as opposed to the name, isn't a database. It's an application library which helps with key-value storage on disk. Dgraph uses RocksDB to store posting lists on disk. It, however, doesn't rely upon RocksDB for anything else. All the data handling happens at Dgraph level. RocksDB is only an interface to disk (sort of how a file library is).

Why doesn't Dgraph use BoltDB?

BoltDB acquires a single global RWMutex lock for all reads and writes. This negatively affects concurrency of iteration and modification of posting lists for Dgraph. Hence, we decided not to use it. On the other hand, RocksDB supports concurrent writes and is being used in production both at Google and Facebook.

Can Dgraph run on other databases, like Cassandra, MySQL, etc.?

No. Dgraph stores and handles data natively to ensure it has complete control over performance and latency. The only thing between Dgraph and disk is the key-value application library, RocksDB.

Languages and Features

Does Dgraph support GraphQL?

Dgraph started with the aim to fully support GraphQL. However, as our experience with the language grew, we started hitting the seams. It couldn't support many of the features required from a language meant to interact with Graph data, and we felt some of the features were unnecessary and complicated. So, we've created a simplified and feature rich version of GraphQL. For lack of better name, we're calling GraphQL+-. You can read more about it here.

When is Dgraph going to support Gremlin?

Dgraph will aim to support Gremlin[2] after v1.0. However, this is not set in stone. If our community wants Gremlin support to interact with other frameworks, like Tinkerpop, we can look into supporting it earlier.

Is Dgraph going to support Cypher?

If there is a demand for it, Dgraph could support Cypher[3]. It would most likely be after v1.0.

Can Dgraph support X?

Please see Dgraph product roadmap[4] of what we're planning to support for v1.0. If X is not part of it, please feel free to start a discussion at discuss.dgraph.io[5], or file a Github Issue[6].

Long Term Plans

Would Dgraph remain open source?

Yes. We've chosen a liberal Apache License 2.0[7] for our core code base. And we have no plans to change that. This open source code base is aimed at backend developers.

Would Dgraph be well supported?

Yes. We're VC funded and plan to use the funds for development. We have a dedicated team of really smart engineers working on this as their full-time job. And of course, we're always open to contributions from the wider community.

How does Dgraph plan to make money?

Too early to say, but we will follow some proprietary plugin and support model to generate revenue and keep the company healthy and going. Note that our core code base would remain under Apache 2.0 license. So the proprietary part would be on top of the open core.

How can I contribute to Dgraph?

We accept both code and documentation contributions. Please see Contributing to Dgraph for more information about how to contribute.

Criticism

Dgraph is not highly available

This is from a reddit thread. Raft means choosing the C in CAP. "Highly Available" means choosing the A. I mean, yeah, adding consistent replication certainly means that it can be more available than something without replication, but advertising this as "highly available" is just misleading. IRC is highly available. Bigtable is. Anything built on raft isn't.

Bigtable is a master-slave architecture. By definition, if a master crashes, the entire thing is useless. That's why you must have passive backup masters ready to replace a master. That's not being highly-available. (In fact, the masters probably use chubby (paxos/raft) to determine who would be the main master, and who'd just sit idle waiting to become master.)

I'm not a fan of CAP theory. It over-simplifies something inherent complicated and diverse. But without going into that, let me address the point about high availability.

This is from Wikipedia: There are three principles of systems design in reliability engineering which can help achieve high availability.

  1. Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system.
  2. Reliable crossover. In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover.
  3. Detection of failures as they occur. If the two principles above are observed, then a user may never see a failure. But the maintenance activity must.

Dgraph does each of these 3 things (if not already, then they're planned).

  1. We don't have a single point of failure. That's why we use RAFT. Each server has the same capabilities as the next.
  2. Even if some servers go down, the queries and writes would still succeed. The queries would automatically be re-routed to a healthy server. Dgraph does reliable crossover.
  3. Unless majority of the cluster goes down, the user wouldn't see the failure. But, the maintainer would know about them.

Given these 3, I think I'm right to claim that Dgraph is highly available.

References

  1. https://github.com/dgraph-io/dgraph/issues?q=is%3Aissue+is%3Aopen+label%3Abenchmark
  2. https://github.com/tinkerpop/gremlin/wiki
  3. https://neo4j.com/developer/cypher-query-language/
  4. https://github.com/dgraph-io/dgraph/issues/1
  5. https://discuss.dgraph.io
  6. https://github.com/dgraph-io/dgraph/issues
  7. http://www.apache.org/licenses/LICENSE-2.0