Brooklet: The SQLite of Event Streaming

I've been using Claude Code heavily since it came out, and one thing I noticed is that every session gets quietly recorded as a JSONL file:

~/.claude/projects/-Users-you-your-project/
├── 3769f69b-a0a9-4cb5-9280-9f43771d4ebd.jsonl  # 1.2MB
├── 69fa079e-c410-407b-b1ad-0041d77b3027.jsonl  # 1.1MB
└── ...92 more session files

Each line is a JSON event. User messages, assistant responses with token counts, tool calls, hook firings. Across 75 sessions on one project, I found 1.1 billion cache read tokens and 2,871 Bash calls sitting there in plain text. I had no idea until I went looking.

A few things stood out from those numbers. The 1.1 billion cache read tokens tell you how aggressively Anthropic is using prompt caching behind the scenes. Cache reads are significantly cheaper than fresh input tokens, so they're actively working to keep costs down for heavy users. The 2,871 Bash calls across 75 sessions means the agent is executing roughly 38 shell commands per session on average.

And that's just one project. This particular repo is my personal knowledge management vault, an Obsidian second brain that I manage almost entirely through Claude Code. I have 21 projects total. If you're someone who uses Claude Code with Obsidian, or across multiple repos, you've probably got hundreds of session files with data you've never looked at.

The problem is there's no tooling around these files. You can cat them and pipe through jq, but what I wanted to ask was:

"Show me new events since I last checked."

import brooklet

stream = brooklet.open("./my-streams")
stream.register("sessions", path="~/.claude/projects/*/*.jsonl", mode="glob")

# First run processes everything
for event in stream.consume("sessions", group="my-app"):
    process(event)

# Second run picks up only new events. Offsets are remembered per consumer group.
for event in stream.consume("sessions", group="my-app"):
    print("New:", event)

"Tail all sessions for anomalies."

# follow=True tails the files like tail -f, but with offset tracking
for event in stream.consume("sessions", group="anomaly-detector", follow=True):
    if looks_like_tool_loop(event):
        alert(event)

"Process each session exactly once, then pick up new ones tomorrow."

# Run this daily. Consumer group "daily-etl" remembers where it left off.
for event in stream.consume("sessions", group="daily-etl"):
    load_into_duckdb(event)
# Tomorrow it resumes from the last offset, no reprocessing.

This isn't specific to Claude Code. JSONL is everywhere in AI tooling. OpenAI and Anthropic batch APIs, OpenTelemetry file exporters, structlog, LangSmith traces. The files exist. What's missing is the coordination layer.

So I built brooklet this last weekend, with Claude Code. It's a lightweight Python library that treats JSONL files as event streams.

Kafka Without the Kafka

I've worked with Kafka on and off over the years. The core abstraction is elegant: an append-only log with offset-based consumption. JSONL is already an append-only log. The gap isn't the log format. It's the coordination layer: consumer offsets, live tailing, topic discovery. The relationship to Kafka is like SQLite to PostgreSQL. Same foundational insight; completely different scope. No JVM, no broker process, no ZooKeeper. Your "message broker" is the filesystem.

One thing I like about this design is that JSONL goes in and JSONL comes out. Every Unix tool works on it at every stage: cat, tail -f, jq, grep. You can debug a topic by reading the file. You can git diff it to see what changed between runs. And because brooklet's produce() also writes JSONL, you can chain consumers together or slot a plain jq filter or a Python script anywhere in the pipeline without any format conversion.

Unix tools already work on JSONL, but they have no memory. cat reads the whole file every time. tail -f tails one file but forgets where it was if you restart. jq is a filter, not a consumer. None of them can answer "what's new since last time?" Brooklet adds one thing on top: stateful consumption. The consumer group remembers its position, so the second run only processes new events. That's the difference between a one-off script and a continuous pipeline you can run daily without reprocessing everything.

What You Can Build With It

To give a concrete example, I built a CLI tool called brooklet-scout on top of the library. It consumes Claude Code session files, aggregates token usage and tool calls, and displays a live dashboard that updates as I work:

                              Scout Analytics Dashboard
┏━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Session   ┃ Events ┃      Duration ┃ Model             ┃ Input Tokens ┃ Output Tokens ┃ Top Tools                                 ┃
┡━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 95f794a5  │    609 │   46.8h (17m) │ claude-opus-4-6   │        1,939 │        37,298 │ Bash(26), Read(14), Edit(14)              │
│ 3769f69b  │   2357 │ 119.2h (2.8h) │ claude-opus-4-6   │        1,523 │       108,875 │ Read(72), Bash(65), Edit(28)              │
├───────────┼────────┼───────────────┼───────────────────┼──────────────┼───────────────┼───────────────────────────────────────────┤
│ TOTAL (2) │   2966 │        166.0h │ claude-opus-4-6=2 │        3,462 │       146,173 │ Bash(91), Read(86), Edit(42), Write(33)   │
└───────────┴────────┴───────────────┴───────────────────┴──────────────┴───────────────┴───────────────────────────────────────────┘

That's two active sessions on the same project, with wall-clock time vs active time (notice session 3769f69b has been "open" for 119 hours but only 2.8 hours of that is actual activity), tool breakdowns, and token counts. The whole thing is about 550 lines of Python sitting on top of brooklet's consume() and produce().

The Envelope

Every event gets a thin metadata envelope auto-injected:

Field	Description	Behavior
`_ts`	ISO 8601 timestamp	Set if missing, preserved if present
`_seq`	Monotonic sequence number	Always set by brooklet
`_src`	Producer identifier	Set from topic name or `source` param

The _ prefix avoids collisions with any producer's payload. Consumer groups track position by _seq, not line count, so segment rotation won't break offsets.

What I Found in My Own Data

Running scout against all my Claude Code sessions for one project:

=== cumulative totals (75 sessions) ===
  tokens: input=171,012 output=1,084,419 cache_read=1,121,426,746
  models: claude-opus-4-6=71 claude-sonnet-4-5=1
  tools: Bash=2871 Read=1425 Edit=586 Write=473 Glob=323
  sessions: avg=800 events, avg=7.5h duration (active: 4.5h)

A few things jumped out:

1.1 billion cache read tokens across 75 sessions on one project. Prompt caching is doing serious work.
One session had 17,498 events and 1,067 Bash calls over 68 hours. That was an autonomous loop I left running overnight.
Bash and Read dominate tool usage. Makes sense for a knowledge management vault where I'm mostly reading, grepping, and running scripts.
Average active time is 4.5 hours vs 7.5 hours wall-clock. The idle gap detection (5-minute threshold) shows how much time is me reading, thinking, or getting coffee.

The --current --follow --rich mode gives a live dashboard that updates as you work. I can see tokens accumulating and tools being called in real-time during a Claude Code session. It's useful for catching when an agent is stuck in a loop.

Try It

uv add brooklet   # or: pip install brooklet

import brooklet

stream = brooklet.open("./my-streams")
stream.register("claude-history", path="~/.claude/history.jsonl", mode="single-file")

for event in stream.consume("claude-history", group="explorer"):
    print(event["_seq"], list(event.keys())[:5])

The full API is five methods: open(), register(), consume(), produce(), topics(). The library is about 450 lines of Python with one dependency (watchdog for filesystem watching in follow mode).

Brooklet is open source: github.com/JoshuaOliphant/brooklet

Kafka Without the Kafka

What You Can Build With It

The Envelope

What I Found in My Own Data

Try It

Related