Most "comparison" articles give you a feature matrix and call it a day. This one does not. We are going to stress-test three fundamentally different approaches to storing and querying graph-shaped data, find their breaking points, and show you exactly when each one fails.
The contenders:
+------------------+ +------------------+ +------------------+
| Neo4j | | Postgres + AGE | | JSONL |
| Native Graph DB | | Relational + | | Flat Files, |
| Cypher queries | | Graph Extension | | No DB at all |
| Index-free adj. | | openCypher on | | Line-delimited |
| $$$$ at scale | | top of tables | | JSON objects |
+------------------+ +------------------+ +------------------+
The Problem: Why Graph-Shaped Data Exists
Not all data is tabular. Consider a social network, a dependency tree, a fraud detection ring, or a knowledge graph. The relationships between entities carry as much meaning as the entities themselves.
Here is the toy graph we will use throughout this post:
[Alice]
/ \
KNOWS KNOWS
/ \
[Bob]-------[Carol]
| KNOWS |
WORKS_AT WORKS_AT
| |
(Acme) (Globex)
| |
COMPETES COMPETES
\ /
(Initech)
Seven nodes. Eight edges. Simple enough to reason about, complex enough to expose real differences.
Round 1: Data Modeling
Neo4j
Cypher makes graph creation feel natural:
CREATE (a:Person {name: 'Alice'})
CREATE (b:Person {name: 'Bob'})
CREATE (c:Person {name: 'Carol'})
CREATE (acme:Company {name: 'Acme'})
CREATE (globex:Company {name: 'Globex'})
CREATE (initech:Company {name: 'Initech'})
CREATE (a)-[:KNOWS]->(b)
CREATE (a)-[:KNOWS]->(c)
CREATE (b)-[:KNOWS]->(c)
CREATE (b)-[:WORKS_AT]->(acme)
CREATE (c)-[:WORKS_AT]->(globex)
CREATE (acme)-[:COMPETES]->(initech)
CREATE (globex)-[:COMPETES]->(initech)
Nodes and edges are first-class citizens. Relationship types are part of the schema. This is genuinely elegant.
Postgres + AGE
AGE bolts openCypher onto PostgreSQL. You create a graph namespace, then write Cypher inside SQL:
-- Enable the extension
CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('social');
SELECT * FROM cypher('social', $$
CREATE (a:Person {name: 'Alice'})
CREATE (b:Person {name: 'Bob'})
CREATE (c:Person {name: 'Carol'})
CREATE (acme:Company {name: 'Acme'})
CREATE (globex:Company {name: 'Globex'})
CREATE (initech:Company {name: 'Initech'})
CREATE (a)-[:KNOWS]->(b)
CREATE (a)-[:KNOWS]->(c)
CREATE (b)-[:KNOWS]->(c)
CREATE (b)-[:WORKS_AT]->(acme)
CREATE (c)-[:WORKS_AT]->(globex)
CREATE (acme)-[:COMPETES]->(initech)
CREATE (globex)-[:COMPETES]->(initech)
$$) as (v agtype);
Same Cypher, but wrapped in a SQL function call. Under the hood, AGE stores vertices and edges in regular PostgreSQL tables with agtype (a JSONB-like binary type) columns. This is both its strength and its weakness, as we will see.
JSONL
No schema. No server. Just lines in a file:
{"id":"alice","type":"Person","name":"Alice"}
{"id":"bob","type":"Person","name":"Bob"}
{"id":"carol","type":"Person","name":"Carol"}
{"id":"acme","type":"Company","name":"Acme"}
{"id":"globex","type":"Company","name":"Globex"}
{"id":"initech","type":"Company","name":"Initech"}
{"src":"alice","rel":"KNOWS","dst":"bob"}
{"src":"alice","rel":"KNOWS","dst":"carol"}
{"src":"bob","rel":"KNOWS","dst":"carol"}
{"src":"bob","rel":"WORKS_AT","dst":"acme"}
{"src":"carol","rel":"WORKS_AT","dst":"globex"}
{"src":"acme","rel":"COMPETES","dst":"initech"}
{"src":"globex","rel":"COMPETES","dst":"initech"}
13 lines. No dependencies. Readable by any language, any tool, any era. Append-only writes are trivial: echo '{"src":"dave","rel":"KNOWS","dst":"alice"}' >> graph.jsonl.
Verdict: Round 1
Modeling Elegance
+------+------+------+
| Neo4j| AGE | JSONL|
| 9 | 7 | 6 |
+------+------+------+
Schema clarity: Neo4j > AGE >> JSONL
Setup friction: JSONL > AGE >> Neo4j
Neo4j wins on expressiveness. JSONL wins on zero friction. AGE sits in the middle, inheriting Cypher's expressiveness but with the ceremony of SQL wrapping.
But wait -- JSONL's simplicity is deceptive. There is no schema enforcement. Nothing prevents you from writing {"src":"alice","rel":"KNOWZ","dst":"bob"} with a typo. You traded elegance for fragility. That is a real cost.
Round 2: Querying -- The 2-Hop Traversal
The question that separates graph systems from pretenders: "Find all companies that compete with a company where a friend-of-Alice works."
In our graph, Alice knows Bob and Carol. Bob works at Acme, Carol works at Globex. Both Acme and Globex compete with Initech. So the answer is {Initech}.
Alice --KNOWS--> Bob --WORKS_AT--> Acme --COMPETES--> Initech
Alice --KNOWS--> Carol --WORKS_AT--> Globex --COMPETES--> Initech
^^^^^^^
ANSWER
Neo4j
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
-[:WORKS_AT]->(company)-[:COMPETES]->(competitor)
RETURN DISTINCT competitor.name
One query. Reads like a sentence. Under the hood, Neo4j uses index-free adjacency: each node physically stores pointers to its neighbors. Traversal is O(1) per hop, not O(log n) like an index lookup. For a 3-hop traversal on a million-node graph, this difference is enormous.
Postgres + AGE
SELECT * FROM cypher('social', $$
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
-[:WORKS_AT]->(company)-[:COMPETES]->(competitor)
RETURN DISTINCT competitor.name
$$) as (competitor agtype);
Identical Cypher. But here is the catch that most articles skip: AGE's Cypher executor does not push property predicates down to PostgreSQL indexes. The {name: 'Alice'} filter on the Person vertex? AGE performs a sequential scan on the vertex table and then filters in its own executor layer. This is documented in AGE issue #2348 (March 2026). For small graphs, you will not notice. At 10M+ vertices, you are in trouble.
Furthermore, AGE's variable-length edge (VLE) expansion is O(n^k) with no cycle detection (issue #2349). The shortestPath() function internally uses VLE instead of BFS (issue #2350). For deep traversals on cyclic graphs, AGE can spiral into exponential blowup where Neo4j handles it with bounded BFS.
JSONL
You write the traversal yourself:
import json
from collections import defaultdict
nodes = {}
edges = defaultdict(list)
with open('graph.jsonl') as f:
for line in f:
obj = json.loads(line)
if 'src' in obj:
edges[obj['src']].append(obj)
else:
nodes[obj['id']] = obj
# 3-hop traversal: Alice -> KNOWS -> WORKS_AT -> COMPETES
alice_friends = [e['dst'] for e in edges['alice']
if e['rel'] == 'KNOWS']
friend_companies = [e['dst'] for e in edges.get(f, [])
for f in alice_friends
if e['rel'] == 'WORKS_AT']
competitors = {e['dst'] for c in friend_companies
for e in edges.get(c, [])
if e['rel'] == 'COMPETES'}
print(competitors) # {'initech'}
It works. But you just wrote a graph traversal engine from scratch. Every new query type requires new code. There is no query planner, no optimizer, no index. You are doing full scans through adjacency lists you built in memory.
Verdict: Round 2
Query Power (Multi-Hop)
+------+------+------+
| Neo4j| AGE | JSONL|
| 10 | 6 | 3 |
+------+------+------+
Why AGE loses points:
- No index pushdown for Cypher property filters
- VLE is O(n^k), no cycle detection
- shortestPath() not BFS-optimized
Why JSONL loses more:
- Every query is hand-coded
- No query optimizer
- Full dataset in memory or repeated scans
Neo4j dominates here. This is its entire reason for existing. AGE gives you the syntax but not the engine. JSONL gives you nothing but raw materials.
Falsification check: could you beat Neo4j's multi-hop performance with JSONL? Yes -- if the graph fits in RAM, a hand-tuned adjacency list in C++ or Rust with cache-aligned memory layout can outperform Neo4j's JVM-based engine. But you are now building a database, not using a file format.
Round 3: Mixed Workloads (Graph + Relational)
Here is where the debate gets interesting. Real applications rarely have pure graph queries. You also need: aggregate revenue by company, filter persons by signup date, join with an orders table, run window functions.
Neo4j
Neo4j has no GROUP BY with window functions. No foreign keys to an orders table. No JOIN with a relational dataset. If your application needs both graph traversals and analytical SQL, you end up running two databases and syncing between them.
+--------+ +--------+
| Neo4j | sync | Postgres|
| (graph)|<-------->| (OLAP) |
+--------+ +--------+
^ ^
| |
+--- Your app ------+
(two connections, two schemas,
two failure modes, eventual
consistency between them)
This is the hidden cost of Neo4j that vendors do not emphasize. You pay for the graph engine, then you pay again for the operational complexity of a dual-database architecture.
Postgres + AGE
This is AGE's killer feature. One database. Graph queries and SQL queries coexist:
-- Graph traversal
SELECT * FROM cypher('social', $$
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN p.name, c.name
$$) as (person agtype, company agtype);
-- In the same database, same transaction:
SELECT company_name, SUM(revenue)
FROM quarterly_earnings
WHERE quarter = '2026-Q1'
GROUP BY company_name
ORDER BY SUM(revenue) DESC;
You can even join graph results with relational tables:
SELECT g.person, g.company, q.revenue
FROM cypher('social', $$
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN p.name, c.name
$$) as g(person agtype, company agtype)
JOIN quarterly_earnings q
ON q.company_name = g.company::text
WHERE q.quarter = '2026-Q1';
One transaction. One connection. ACID across both graph and relational data.
JSONL
You load both the JSONL graph and your CSV/Parquet tables into pandas or DuckDB and join manually. Doable for analytics pipelines. Not doable for a production application serving real-time requests.
Verdict: Round 3
Mixed Workload (Graph + Relational)
+------+------+------+
| Neo4j| AGE | JSONL|
| 3 | 9 | 4 |
+------+------+------+
AGE wins decisively here.
The "one database" story is real and valuable.
Falsification check: Neo4j added GDS (Graph Data Science) library for analytics, and version 5.x improved aggregation capabilities. But it still cannot replace a relational database for structured analytical queries. The gap is narrowing but real.
Round 4: Operational Cost
Let us stop being coy about money.
Neo4j
- Community Edition: free, but limited to a single machine (no clustering, no online backup, no role-based access control).
- Enterprise Edition: contact sales. Typically $100K+/year for production deployments. This is not a rounding error.
- AuraDB (managed): starts around $65/month for a toy instance. Production instances with adequate memory run $500-2000+/month.
- Hidden cost: JVM tuning. Neo4j runs on the JVM. Heap sizing, GC pauses, off-heap page cache configuration. You need someone who understands JVM operations.
Neo4j Cost Curve
$
| /
| /
| /
| /
| /
| . / <-- Enterprise license kicks in
| . .
| .
| .
+-----------------------------------> Data Size
Free tier "Call Sales"
Postgres + AGE
- PostgreSQL: free, open source, battle-tested.
- AGE extension: free, Apache 2.0 licensed. No licensing games.
- Managed options: any managed Postgres provider (RDS, Cloud SQL, Supabase) -- though AGE extension support varies. You may need to self-host.
- Hidden cost: AGE is still maturing. You will hit bugs. The community is smaller. You become your own support tier.
Postgres+AGE Cost Curve
$
|
| ---------------------------- (flat: infra cost only)
|
|
+-----------------------------------> Data Size
Always open source, pay for compute only
JSONL
- Cost: zero. It is files.
- Hidden cost: engineering time. Every "feature" (indexing, querying, concurrency, durability) you build yourself. At small scale this is free. At medium scale it is cheaper than Neo4j. At large scale it is more expensive than both because you are paying senior engineers to maintain a bespoke system.
JSONL Total Cost of Ownership
$
| /
| /
| / <-- Engineering time
| / dominates
| /
| /
| /
| /
| /
| /
| . . . . . . <-- Near-zero at small scale
+-----------------------------------> Complexity
Verdict: Round 4
Total Cost of Ownership
+------+------+------+
| Neo4j| AGE | JSONL|
| 4 | 8 | 7* |
+------+------+------+
* JSONL score drops to 2 above ~1M edges
Round 5: Write Performance and Ingestion
Bulk loading 1 million edges. A common real-world requirement.
Neo4j
Neo4j ships neo4j-admin import for offline bulk loading. It is fast -- millions of nodes per second -- but requires the database to be stopped. For online ingestion, UNWIND with batched transactions achieves ~50K-100K edges/second depending on hardware.
Postgres + AGE
This is where AGE currently hurts. Multiple GitHub issues document the pain:
- Issue #2177: MERGE on 10K+ edges becomes unusably slow (minutes for 70K edges)
- Issue #1925: 83K edge creation causes severe slowdown
- Issue #2198: large-table edge creation performance degrades non-linearly
AGE creates edges by inserting rows into a PostgreSQL table. Each edge insert requires lookups to validate source and target vertices. Without proper index pushdown (issue #2348), these lookups scale poorly.
For bulk loading, you can bypass Cypher and insert directly into AGE's internal tables -- but this requires understanding AGE's internal schema, which is undocumented and fragile.
JSONL
# 1 million edges
time python3 -c "
import json
for i in range(1_000_000):
print(json.dumps({'src': f'n{i}', 'rel': 'LINKS', 'dst': f'n{i+1}'}))
" > edges.jsonl
This completes in seconds. Append is O(1). No index to update, no constraint to check, no transaction to commit. You cannot beat the write speed of "just append text to a file."
But you pay for it later. Every read is now a full scan unless you build your own index.
Verdict: Round 5
Write / Bulk Ingest Speed
+------+------+------+
| Neo4j| AGE | JSONL|
| 7 | 4 | 10 |
+------+------+------+
JSONL: unbeatable write speed, unbearable read speed
Neo4j: good with offline import, decent online
AGE: the weakest link for large ingestion
Round 6: Ecosystem and Tooling
Neo4j
- Visualization: Neo4j Browser, Bloom (enterprise), extensive third-party tools
- Language drivers: official drivers for Java, Python, JavaScript, .NET, Go
- GDS library: PageRank, community detection, pathfinding, embeddings
- APOC: 500+ utility procedures
- Community: large, mature, well-documented
Postgres + AGE
- Visualization: limited; AGE Viewer exists but is basic
- Language drivers: any PostgreSQL driver works, but handling
agtyperesults requires extra parsing - Graph algorithms: none built-in. You write them in Cypher or fall back to SQL
- Community: growing but small. ~2K GitHub stars vs Neo4j's ~13K+
- Documentation: improving, but gaps remain. Many "how do I do X?" questions on GitHub issues have no official answer
JSONL
- Visualization: none (build your own, or load into NetworkX/Gephi)
- Language drivers: every language has a JSON parser
- Graph algorithms: use NetworkX, igraph, or write from scratch
- Community: infinite (it is just JSON) and zero (no graph-specific tooling)
Verdict: Round 6
Ecosystem & Tooling
+------+------+------+
| Neo4j| AGE | JSONL|
| 10 | 5 | 3 |
+------+------+------+
The Honest Scorecard
Category Neo4j AGE JSONL
------------------------------------------------
Data Modeling 9 7 6
Multi-Hop Queries 10 6 3
Mixed Workloads 3 9 4
Operational Cost 4 8 7*
Write Performance 7 4 10
Ecosystem 10 5 3
------------------------------------------------
TOTAL 43 39 33
* JSONL drops to 25 at scale (cost becomes 2)
But totals are misleading. Nobody needs all six categories equally. Here is the real decision framework:
When Each Wins (Honestly)
Choose Neo4j when:
- Your core product IS the graph (social network, fraud ring detection, recommendation engine)
- You need multi-hop traversals deeper than 3 hops on 10M+ node graphs
- You can afford the license or AuraDB pricing
- You have a dedicated data engineering team for JVM tuning
- Graph algorithms (PageRank, community detection) are a primary use case
Choose Postgres + AGE when:
- You already run PostgreSQL and want to add graph queries without a second database
- Your workload is 70% relational, 30% graph
- Your graph is under 5M nodes (AGE's performance ceiling today)
- You need ACID transactions spanning graph and relational data
- Budget is a hard constraint and vendor lock-in is unacceptable
- You value the option to query graph data with SQL joins
Choose JSONL when:
- You are prototyping or exploring data before committing to infrastructure
- Your graph is small (under 100K edges) and mostly read at startup
- You need maximum portability (the data goes into git, S3, email attachments)
- You are building a data pipeline where the graph is intermediate, not the product
- You want append-only audit logs of graph mutations
- You are one person on a weekend project
When Each Loses (Honestly)
Neo4j loses when:
- You need relational analytics alongside graph queries (you end up with two databases)
- Your startup cannot justify the Enterprise license
- Your graph is small enough that PostgreSQL handles it fine
- You need your ops team to manage one less piece of infrastructure
- Your data model changes frequently (schema migrations in Neo4j are manual)
Postgres + AGE loses when:
- You need deep traversals (5+ hops) on large graphs -- VLE blows up
- You need
shortestPath()on cyclic graphs at scale -- it is not BFS - You need to bulk-ingest millions of edges quickly
- You need graph-specific algorithms (PageRank, Louvain) out of the box
- You cannot tolerate being an early adopter of a maturing extension
JSONL loses when:
- You need concurrent writes from multiple processes
- You need real-time traversal queries on graphs larger than RAM
- You need ACID guarantees
- You need anyone other than the original author to query the data
- You are building anything that will be in production for more than 6 months
The Falsification Test
Every claim above should be testable. Here are the ones I am least certain about:
-
"AGE's VLE is O(n^k)" -- This is documented in AGE issue #2349, but the AGE team may fix it. If you are reading this post after mid-2026, check whether BFS-based path expansion has been merged.
-
"Neo4j's index-free adjacency makes it O(1) per hop" -- Strictly true for cache-hot data. When the graph exceeds page cache, it degrades to random disk I/O, which is closer to O(log n) in practice. Neo4j's marketing overstates this.
-
"JSONL beats all for write speed" -- True for raw append. But if you need to deduplicate on write, you lose this advantage entirely. Dedup requires an index, and now you are building a database.
-
"Postgres+AGE is free" -- The software is free. But if you need graph algorithm support comparable to Neo4j GDS, you will either write it yourself (expensive) or use a separate tool (complexity cost).
Closing Thought
The question is never "which is the best graph database?" The question is "how much of my problem is actually a graph problem?"
If the answer is "almost all of it" -- Neo4j. If the answer is "some of it, alongside relational needs" -- Postgres+AGE. If the answer is "I'm not sure yet" -- start with JSONL, learn your access patterns, then migrate to the right tool.
The worst outcome is choosing Neo4j for a problem that is 80% relational, or choosing JSONL for a problem that needs real-time 5-hop traversals. Both are expensive mistakes, just in different currencies: money for the first, engineering time for the second.
Decision Flow:
Is your core product a graph?
|
+-- YES --> Can you afford Neo4j Enterprise?
| |
| +-- YES --> Neo4j
| +-- NO --> Neo4j Community (single node)
| or Postgres+AGE
|
+-- NO --> Do you need graph queries in production?
|
+-- YES --> Postgres+AGE
+-- NO --> JSONL (prototype, then decide)
Pick based on where your problem actually lives, not where the marketing says it should.
Director of Agentic AI for the Enterprise at Writer. Building at the intersection of language, intelligence, and design.