Neo4j vs Postgres+AGE vs JSONL: A Graph Data Showdown

Overview

Most "comparison" articles give you a feature matrix and call it a day. This one does not. We are going to stress-test three fundamentally different approaches to storing and querying graph-shaped data, find their breaking points, and show you exactly when each one fails.

The contenders:

  +------------------+    +------------------+    +------------------+
  |     Neo4j        |    |  Postgres + AGE  |    |      JSONL       |
  |  Native Graph DB |    |  Relational +    |    |  Flat Files,     |
  |  Cypher queries  |    |  Graph Extension |    |  No DB at all    |
  |  Index-free adj. |    |  openCypher on   |    |  Line-delimited  |
  |  $$$$ at scale   |    |  top of tables   |    |  JSON objects     |
  +------------------+    +------------------+    +------------------+

The Problem: Why Graph-Shaped Data Exists

Not all data is tabular. Consider a social network, a dependency tree, a fraud detection ring, or a knowledge graph. The relationships between entities carry as much meaning as the entities themselves.

Here is the toy graph we will use throughout this post:

Seven nodes. Eight edges. Simple enough to reason about, complex enough to expose real differences.

Round 1: Data Modeling

First question: how does each approach let you define the nodes (entities) and edges (relationships) in the first place? Modeling is where you feel the gap between a purpose-built graph language and a plain flat file.

Neo4j

Cypher makes graph creation feel natural:

CREATE (a:Person {name: 'Alice'})
CREATE (b:Person {name: 'Bob'})
CREATE (c:Person {name: 'Carol'})
CREATE (acme:Company {name: 'Acme'})
CREATE (globex:Company {name: 'Globex'})
CREATE (initech:Company {name: 'Initech'})

CREATE (a)-[:KNOWS]->(b)
CREATE (a)-[:KNOWS]->(c)
CREATE (b)-[:KNOWS]->(c)
CREATE (b)-[:WORKS_AT]->(acme)
CREATE (c)-[:WORKS_AT]->(globex)
CREATE (acme)-[:COMPETES]->(initech)
CREATE (globex)-[:COMPETES]->(initech)

Nodes and edges are first-class citizens. Relationship types are part of the schema. This is genuinely elegant.

Postgres + AGE

AGE bolts openCypher onto PostgreSQL. You create a graph namespace, then write Cypher inside SQL:

-- Enable the extension
CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

SELECT create_graph('social');

SELECT * FROM cypher('social', $$
  CREATE (a:Person {name: 'Alice'})
  CREATE (b:Person {name: 'Bob'})
  CREATE (c:Person {name: 'Carol'})
  CREATE (acme:Company {name: 'Acme'})
  CREATE (globex:Company {name: 'Globex'})
  CREATE (initech:Company {name: 'Initech'})
  CREATE (a)-[:KNOWS]->(b)
  CREATE (a)-[:KNOWS]->(c)
  CREATE (b)-[:KNOWS]->(c)
  CREATE (b)-[:WORKS_AT]->(acme)
  CREATE (c)-[:WORKS_AT]->(globex)
  CREATE (acme)-[:COMPETES]->(initech)
  CREATE (globex)-[:COMPETES]->(initech)
$$) as (v agtype);

Same Cypher, but wrapped in a SQL function call. Under the hood, AGE stores vertices and edges in regular PostgreSQL tables with agtype (a JSONB-like binary type) columns. This is both its strength and its weakness, as we will see.

JSONL

No schema. No server. Just lines in a file:

{"id":"alice","type":"Person","name":"Alice"}
{"id":"bob","type":"Person","name":"Bob"}
{"id":"carol","type":"Person","name":"Carol"}
{"id":"acme","type":"Company","name":"Acme"}
{"id":"globex","type":"Company","name":"Globex"}
{"id":"initech","type":"Company","name":"Initech"}
{"src":"alice","rel":"KNOWS","dst":"bob"}
{"src":"alice","rel":"KNOWS","dst":"carol"}
{"src":"bob","rel":"KNOWS","dst":"carol"}
{"src":"bob","rel":"WORKS_AT","dst":"acme"}
{"src":"carol","rel":"WORKS_AT","dst":"globex"}
{"src":"acme","rel":"COMPETES","dst":"initech"}
{"src":"globex","rel":"COMPETES","dst":"initech"}

13 lines. No dependencies. Readable by any language, any tool, any era. Append-only writes are trivial: echo '{"src":"dave","rel":"KNOWS","dst":"alice"}' >> graph.jsonl.

Verdict: Round 1

Each approach gets a score out of 10 for how cleanly it lets you express a graph.

Neo4j wins on expressiveness. JSONL wins on zero friction. AGE sits in the middle, inheriting Cypher's expressiveness but with the ceremony of SQL wrapping.

But wait -- JSONL's simplicity is deceptive. There is no schema enforcement. Nothing prevents you from writing {"src":"alice","rel":"KNOWZ","dst":"bob"} with a typo. You traded elegance for fragility. That is a real cost.

  Modeling Elegance
  +------+------+------+
  | Neo4j| AGE  | JSONL|
  |  9   |  7   |  6   |
  +------+------+------+
  Schema clarity:  Neo4j > AGE >> JSONL
  Setup friction:  JSONL > AGE >> Neo4j

Round 2: Querying -- The 2-Hop Traversal

The question that separates graph systems from pretenders: "Find all companies that compete with a company where a friend-of-Alice works."

In our graph, Alice knows Bob and Carol. Bob works at Acme, Carol works at Globex. Both Acme and Globex compete with Initech. So the answer is {Initech}.

Neo4j

MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
      -[:WORKS_AT]->(company)-[:COMPETES]->(competitor)
RETURN DISTINCT competitor.name

One query. Reads like a sentence. Under the hood, Neo4j uses index-free adjacency: each node physically stores pointers to its neighbors. Traversal is O(1) per hop, not O(log n) like an index lookup. For a 3-hop traversal on a million-node graph, this difference is enormous.

Postgres + AGE

SELECT * FROM cypher('social', $$
  MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
        -[:WORKS_AT]->(company)-[:COMPETES]->(competitor)
  RETURN DISTINCT competitor.name
$$) as (competitor agtype);

Identical Cypher. But here is the catch that most articles skip: AGE's Cypher executor does not push property predicates down to PostgreSQL indexes. The {name: 'Alice'} filter on the Person vertex? AGE performs a sequential scan on the vertex table and then filters in its own executor layer. This is documented in AGE issue #2348 (March 2026). For small graphs, you will not notice. At 10M+ vertices, you are in trouble.

Furthermore, AGE's variable-length edge (VLE) expansion is O(n^k) with no cycle detection (issue #2349). The shortestPath() function internally uses VLE instead of BFS (issue #2350). For deep traversals on cyclic graphs, AGE can spiral into exponential blowup where Neo4j handles it with bounded BFS.

JSONL

You write the traversal yourself:

import json
from collections import defaultdict

nodes = {}
edges = defaultdict(list)

with open('graph.jsonl') as f:
    for line in f:
        obj = json.loads(line)
        if 'src' in obj:
            edges[obj['src']].append(obj)
        else:
            nodes[obj['id']] = obj

# 3-hop traversal: Alice -> KNOWS -> WORKS_AT -> COMPETES
alice_friends = [e['dst'] for e in edges['alice']
                 if e['rel'] == 'KNOWS']
friend_companies = [e['dst'] for e in edges.get(f, [])
                    for f in alice_friends
                    if e['rel'] == 'WORKS_AT']
competitors = {e['dst'] for c in friend_companies
               for e in edges.get(c, [])
               if e['rel'] == 'COMPETES'}

print(competitors)  # {'initech'}

It works. But you just wrote a graph traversal engine from scratch. Every new query type requires new code. There is no query planner, no optimizer, no index. You are doing full scans through adjacency lists you built in memory.

Verdict: Round 2

Scored out of 10 on raw multi-hop query power, meaning how well each handles queries that walk several relationships in a row.

Neo4j dominates here. This is its entire reason for existing. AGE gives you the syntax but not the engine. JSONL gives you nothing but raw materials.

Falsification check: could you beat Neo4j's multi-hop performance with JSONL? Yes -- if the graph fits in RAM, a hand-tuned adjacency list in C++ or Rust with cache-aligned memory layout can outperform Neo4j's JVM-based engine. But you are now building a database, not using a file format.

  Query Power (Multi-Hop)
  +------+------+------+
  | Neo4j| AGE  | JSONL|
  |  10  |  6   |  3   |
  +------+------+------+

  Why AGE loses points:
  - No index pushdown for Cypher property filters
  - VLE is O(n^k), no cycle detection
  - shortestPath() not BFS-optimized

  Why JSONL loses more:
  - Every query is hand-coded
  - No query optimizer
  - Full dataset in memory or repeated scans

Round 3: Mixed Workloads (Graph + Relational)

Here is where the debate gets interesting. Real applications rarely have pure graph queries. You also need: aggregate revenue by company, filter persons by signup date, join with an orders table, run window functions.

Neo4j

Neo4j has no GROUP BY with window functions. No foreign keys to an orders table. No JOIN with a relational dataset. If your application needs both graph traversals and analytical SQL, you end up running two databases and syncing between them.

This is the hidden cost of Neo4j that vendors do not emphasize. You pay for the graph engine, then you pay again for the operational complexity of a dual-database architecture.

Postgres + AGE

This is AGE's killer feature. One database. Graph queries and SQL queries coexist:

-- Graph traversal
SELECT * FROM cypher('social', $$
  MATCH (p:Person)-[:WORKS_AT]->(c:Company)
  RETURN p.name, c.name
$$) as (person agtype, company agtype);

-- In the same database, same transaction:
SELECT company_name, SUM(revenue)
FROM quarterly_earnings
WHERE quarter = '2026-Q1'
GROUP BY company_name
ORDER BY SUM(revenue) DESC;

You can even join graph results with relational tables:

SELECT g.person, g.company, q.revenue
FROM cypher('social', $$
  MATCH (p:Person)-[:WORKS_AT]->(c:Company)
  RETURN p.name, c.name
$$) as g(person agtype, company agtype)
JOIN quarterly_earnings q
  ON q.company_name = g.company::text
WHERE q.quarter = '2026-Q1';

One transaction. One connection. ACID across both graph and relational data.

JSONL

You load both the JSONL graph and your CSV/Parquet tables into pandas or DuckDB and join manually. Doable for analytics pipelines. Not doable for a production application serving real-time requests.

Verdict: Round 3

Scored out of 10 on how well each handles graph queries and ordinary relational work (aggregates, joins, filters) together.

Falsification check: Neo4j added GDS (Graph Data Science) library for analytics, and version 5.x improved aggregation capabilities. But it still cannot replace a relational database for structured analytical queries. The gap is narrowing but real.

  Mixed Workload (Graph + Relational)
  +------+------+------+
  | Neo4j| AGE  | JSONL|
  |  3   |  9   |  4   |
  +------+------+------+

  AGE wins decisively here.
  The "one database" story is real and valuable.

Round 4: Operational Cost

Let us stop being coy about money.

Neo4j

Community Edition: free, but limited to a single machine (no clustering, no online backup, no role-based access control).
Enterprise Edition: contact sales. Typically $100K+/year for production deployments. This is not a rounding error.
AuraDB (managed): starts around $65/month for a toy instance. Production instances with adequate memory run $500-2000+/month.
Hidden cost: JVM tuning. Neo4j runs on the JVM. Heap sizing, GC pauses, off-heap page cache configuration. You need someone who understands JVM operations.

Postgres + AGE

PostgreSQL: free, open source, battle-tested.
AGE extension: free, Apache 2.0 licensed. No licensing games.
Managed options: any managed Postgres provider (RDS, Cloud SQL, Supabase) -- though AGE extension support varies. You may need to self-host.
Hidden cost: AGE is still maturing. You will hit bugs. The community is smaller. You become your own support tier.

JSONL

Cost: zero. It is files.
Hidden cost: engineering time. Every "feature" (indexing, querying, concurrency, durability) you build yourself. At small scale this is free. At medium scale it is cheaper than Neo4j. At large scale it is more expensive than both because you are paying senior engineers to maintain a bespoke system.

Verdict: Round 4

Scored out of 10 on total cost of ownership (licensing plus the engineering and ops time you pay over the life of the system); higher is cheaper.

  Neo4j Cost Curve
  $
  |                              /
  |                            /
  |                          /
  |                       /
  |                    /
  |               . /  <-- Enterprise license kicks in
  |          .  .
  |     .
  | .
  +-----------------------------------> Data Size
    Free tier       "Call Sales"

  Postgres+AGE Cost Curve
  $
  |
  |    ----------------------------  (flat: infra cost only)
  |
  |
  +-----------------------------------> Data Size
    Always open source, pay for compute only

  JSONL Total Cost of Ownership
  $
  |                                    /
  |                                  /
  |                                / <-- Engineering time
  |                             /       dominates
  |                          /
  |                      /
  |                  /
  |             /
  |        /
  |   /
  | .  .  .  .  .  .  <-- Near-zero at small scale
  +-----------------------------------> Complexity

  Total Cost of Ownership
  +------+------+------+
  | Neo4j| AGE  | JSONL|
  |  4   |  8   |  7*  |
  +------+------+------+
  * JSONL score drops to 2 above ~1M edges

Round 5: Write Performance and Ingestion

Bulk loading 1 million edges. A common real-world requirement.

Neo4j

Neo4j ships neo4j-admin import for offline bulk loading. It is fast -- millions of nodes per second -- but requires the database to be stopped. For online ingestion, UNWIND with batched transactions achieves ~50K-100K edges/second depending on hardware.

Postgres + AGE

This is where AGE currently hurts. Multiple GitHub issues document the pain:

Issue #2177: MERGE on 10K+ edges becomes unusably slow (minutes for 70K edges)
Issue #1925: 83K edge creation causes severe slowdown
Issue #2198: large-table edge creation performance degrades non-linearly

AGE creates edges by inserting rows into a PostgreSQL table. Each edge insert requires lookups to validate source and target vertices. Without proper index pushdown (issue #2348), these lookups scale poorly.

For bulk loading, you can bypass Cypher and insert directly into AGE's internal tables -- but this requires understanding AGE's internal schema, which is undocumented and fragile.

JSONL

# 1 million edges
time python3 -c "
import json
for i in range(1_000_000):
    print(json.dumps({'src': f'n{i}', 'rel': 'LINKS', 'dst': f'n{i+1}'}))
" > edges.jsonl

This completes in seconds. Append is O(1). No index to update, no constraint to check, no transaction to commit. You cannot beat the write speed of "just append text to a file."

But you pay for it later. Every read is now a full scan unless you build your own index.

Verdict: Round 5

Scored out of 10 on bulk write and ingestion speed, meaning how fast you can load large batches of edges.

  Write / Bulk Ingest Speed
  +------+------+------+
  | Neo4j| AGE  | JSONL|
  |  7   |  4   |  10  |
  +------+------+------+

  JSONL: unbeatable write speed, unbearable read speed
  Neo4j: good with offline import, decent online
  AGE: the weakest link for large ingestion

Round 6: Ecosystem and Tooling

Beyond raw queries, daily work leans on everything around the database: visualizers, language drivers (the libraries your code uses to talk to the database), prebuilt graph algorithms, and how big and helpful the community is. Here is what each one gives you.

Neo4j

Visualization: Neo4j Browser, Bloom (enterprise), extensive third-party tools
Language drivers: official drivers for Java, Python, JavaScript, .NET, Go
GDS library: PageRank, community detection, pathfinding, embeddings
APOC: 500+ utility procedures
Community: large, mature, well-documented

Postgres + AGE

Visualization: limited; AGE Viewer exists but is basic
Language drivers: any PostgreSQL driver works, but handling agtype results requires extra parsing
Graph algorithms: none built-in. You write them in Cypher or fall back to SQL
Community: growing but small. ~2K GitHub stars vs Neo4j's ~13K+
Documentation: improving, but gaps remain. Many "how do I do X?" questions on GitHub issues have no official answer

JSONL

Visualization: none (build your own, or load into NetworkX/Gephi)
Language drivers: every language has a JSON parser
Graph algorithms: use NetworkX, igraph, or write from scratch
Community: infinite (it is just JSON) and zero (no graph-specific tooling)

Verdict: Round 6

Scored out of 10 on the breadth of tooling, drivers, and community around each option.

  Ecosystem & Tooling
  +------+------+------+
  | Neo4j| AGE  | JSONL|
  |  10  |  5   |  3   |
  +------+------+------+

The Honest Scorecard

Pulling every round together, here are the per-category scores side by side (each out of 10) and a running total.

  Category              Neo4j   AGE   JSONL
  ------------------------------------------------
  Data Modeling           9       7      6
  Multi-Hop Queries      10       6      3
  Mixed Workloads         3       9      4
  Operational Cost        4       8      7*
  Write Performance       7       4     10
  Ecosystem              10       5      3
  ------------------------------------------------
  TOTAL                  43      39     33

  * JSONL drops to 25 at scale (cost becomes 2)

But totals are misleading. Nobody needs all six categories equally. Here is the real decision framework:

When Each Wins (Honestly)

The choice collapses to a few honest questions about scale, workload mix, and budget.

Choose Neo4j when:

Your core product IS the graph (social network, fraud ring detection, recommendation engine)
You need multi-hop traversals deeper than 3 hops on 10M+ node graphs
You can afford the license or AuraDB pricing
You have a dedicated data engineering team for JVM tuning
Graph algorithms (PageRank, community detection) are a primary use case

Choose Postgres + AGE when:

You already run PostgreSQL and want to add graph queries without a second database
Your workload is 70% relational, 30% graph
Your graph is under 5M nodes (AGE's performance ceiling today)
You need ACID transactions spanning graph and relational data
Budget is a hard constraint and vendor lock-in is unacceptable
You value the option to query graph data with SQL joins

Choose JSONL when:

You are prototyping or exploring data before committing to infrastructure
Your graph is small (under 100K edges) and mostly read at startup
You need maximum portability (the data goes into git, S3, email attachments)
You are building a data pipeline where the graph is intermediate, not the product
You want append-only audit logs of graph mutations
You are one person on a weekend project

When Each Loses (Honestly)

The flip side matters just as much: here are the situations where each approach actively works against you.

Neo4j loses when:

You need relational analytics alongside graph queries (you end up with two databases)
Your startup cannot justify the Enterprise license
Your graph is small enough that PostgreSQL handles it fine
You need your ops team to manage one less piece of infrastructure
Your data model changes frequently (schema migrations in Neo4j are manual)

Postgres + AGE loses when:

You need deep traversals (5+ hops) on large graphs -- VLE blows up
You need shortestPath() on cyclic graphs at scale -- it is not BFS
You need to bulk-ingest millions of edges quickly
You need graph-specific algorithms (PageRank, Louvain) out of the box
You cannot tolerate being an early adopter of a maturing extension

JSONL loses when:

You need concurrent writes from multiple processes
You need real-time traversal queries on graphs larger than RAM
You need ACID guarantees
You need anyone other than the original author to query the data
You are building anything that will be in production for more than 6 months

The Falsification Test

Every claim above should be testable. Here are the ones I am least certain about:

"AGE's VLE is O(n^k)" -- This is documented in AGE issue #2349, but the AGE team may fix it. If you are reading this post after mid-2026, check whether BFS-based path expansion has been merged.
"Neo4j's index-free adjacency makes it O(1) per hop" -- Strictly true for cache-hot data. When the graph exceeds page cache, it degrades to random disk I/O, which is closer to O(log n) in practice. Neo4j's marketing overstates this.
"JSONL beats all for write speed" -- True for raw append. But if you need to deduplicate on write, you lose this advantage entirely. Dedup requires an index, and now you are building a database.
"Postgres+AGE is free" -- The software is free. But if you need graph algorithm support comparable to Neo4j GDS, you will either write it yourself (expensive) or use a separate tool (complexity cost).

Closing Thought

The question is never "which is the best graph database?" The question is "how much of my problem is actually a graph problem?"

If the answer is "almost all of it" -- Neo4j. If the answer is "some of it, alongside relational needs" -- Postgres+AGE. If the answer is "I'm not sure yet" -- start with JSONL, learn your access patterns, then migrate to the right tool.

The worst outcome is choosing Neo4j for a problem that is 80% relational, or choosing JSONL for a problem that needs real-time 5-hop traversals. Both are expensive mistakes, just in different currencies: money for the first, engineering time for the second.

Pick based on where your problem actually lives, not where the marketing says it should.

APR 2026Building a Palantir-Style Ontology without Palantir JUL 2026From L3 to L7: How the Network Layers Actually Work JUN 2026Meta's Ranking Engineer Agent: Autonomous Experimentation Under Hard Guardrails APR 2026Applying Thiel's Monopoly Playbook to Arabic Ed-Tech

Overview

The Problem: Why Graph-Shaped Data Exists

Round 1: Data Modeling

Neo4j

Postgres + AGE

JSONL

Verdict: Round 1

Round 2: Querying -- The 2-Hop Traversal

Neo4j

Postgres + AGE

JSONL

Verdict: Round 2

Round 3: Mixed Workloads (Graph + Relational)

Neo4j

Postgres + AGE

JSONL

Verdict: Round 3

Round 4: Operational Cost

Neo4j

Postgres + AGE

JSONL

Verdict: Round 4

Round 5: Write Performance and Ingestion

Neo4j

Postgres + AGE

JSONL

Verdict: Round 5

Round 6: Ecosystem and Tooling

Neo4j

Postgres + AGE

JSONL

Verdict: Round 6

The Honest Scorecard

When Each Wins (Honestly)

Choose Neo4j when:

Choose Postgres + AGE when:

Choose JSONL when:

When Each Loses (Honestly)

Neo4j loses when:

Postgres + AGE loses when:

JSONL loses when:

The Falsification Test

Closing Thought

Related Articles