Performance
How gr stores data for fast reads, the knobs that matter (bulk import, the buffer pool, indexes, parallel execution, checkpointing), and how to measure a query.
gr is built so that the common graph query, a pattern match that walks relationships, stays close to the speed of the underlying storage. This page explains where the time goes and which levers move it.
The storage shape
A .gr file holds two layers.
The base is the sealed, read-optimized image: node and relationship properties in columnar segments, and adjacency in CSR (compressed sparse row) arrays so a node's neighbors sit contiguously and expand in a single sequential scan.
The delta is everything written since the last checkpoint, living in the write-ahead log and an in-memory overlay. Reads merge the delta on top of the base.
A checkpoint folds the delta back into the base.
After a checkpoint, a read of an unchanged node hits the base CSR directly with
no overlay merge, which is the fastest path gr has.
Checkpoints happen automatically (see pragmas:
wal_autocheckpoint every 1000 WAL frames by default, plus the optional
checkpoint_interval_s) and on db.Close().
The practical consequence: a database that has just been bulk-loaded or freshly checkpointed reads faster than one carrying a large uncommitted delta. If you load a lot of data through Cypher and then run read-heavy queries, a checkpoint in between pays for itself.
Loading fast
For a cold load of more than roughly 100,000 nodes or relationships, use
gr import rather than individual CREATE
statements.
The importer writes the columnar segments and CSR arrays directly, skips the
WAL, and fsyncs once at the end, so it is typically 10 to 100x faster than the
transactional write path for the same data.
The file it produces is a normal, sealed .gr file: it opens with gr.Open,
passes gr check, and is indistinguishable from one grown transactionally.
Use the transactional path (the library or the CLI) for incremental writes after the initial load, not for the initial load itself.
The buffer pool
gr caches pages in a buffer pool. A query that finds its pages already resident never touches the disk.
Size it to your working set with cache_size, or let gr size it from available
memory with cache_auto_fraction (0.25 of RAM by default):
PRAGMA cache_size = -262144 -- 256 MB, negative value is kibibytes
For a one-shot bulk read that would otherwise evict your hot pages, set
PRAGMA cache_disabled = true for that connection so the scan does not pollute
the pool.
Indexes turn scans into seeks
Without an index, a MATCH that filters on a property scans every node of the
label.
With an index, gr seeks straight to the matching nodes.
gr run graph.gr "CREATE INDEX FOR (p:Person) ON (p.email)"
Create indexes on the properties you filter or join on, and create them after a
bulk import rather than before, so the load does not pay to maintain them.
Use EXPLAIN to confirm the planner picked the seek:
gr run graph.gr "EXPLAIN MATCH (p:Person {email:'[email protected]'}) RETURN p"
Parallel execution
By default a query runs on a single worker. For large scans and aggregations, raise the worker count so gr splits the work across cores:
PRAGMA parallelism = 8
Execution is vectorized: rows flow through operators in batches (morsels) rather
than one at a time.
morsel_size (1024 rows by default) sets the batch size.
Parallelism helps queries that touch a lot of data; it does nothing for a query
that seeks a single node, and on a saturated machine more workers can lose to a
single-threaded run, so set it to match the cores you actually have free.
Repeated queries
gr caches parsed and planned queries (plan_cache, on by default) and extracts
literal constants as parameters so structurally identical queries share a plan
(auto_parameterize, on by default).
You get the most out of both by parameterizing queries yourself instead of
inlining values, which also avoids re-planning and is safer:
db.Query(ctx, "MATCH (p:Person {email:$email}) RETURN p", map[string]any{"email": addr})
Write throughput and durability
Write speed trades against how hard gr works to survive a crash.
The default fsyncs on checkpoint so a power failure cannot corrupt the file.
SyncNormal skips the extra post-checkpoint fsync (safe on most filesystems),
and SyncOff turns off fsyncing entirely: fast, but a power failure can lose
recent writes.
See opening a database for the sync modes and
pragmas for wal_autocheckpoint and wal_size_limit,
which control how large the delta grows before it is folded back.
Measuring a query
PROFILE runs the query and reports the operator tree with row counts and
timing, so you can see which operator dominates:
gr run graph.gr "PROFILE MATCH (a:Person)-[:KNOWS]->(b) RETURN count(*)"
EXPLAIN shows the plan without running it, which is enough to confirm an index
is used or a join order is sane.
For a repeatable microbenchmark, drive gr from a Go testing.B and measure with
go test -bench, checkpointing first so the read path hits the base CSR rather
than a warm in-memory delta.
A query measured against an un-checkpointed database is measuring the delta
overlay, not the storage gr actually ships data on.
To compare gr against other graph engines on your own workload, the graph-bench harness runs the same query set against several engines in one process and reports per-percentile latency. Numbers depend heavily on the workload and the machine, so measure the queries you actually run, on the hardware you actually deploy on.