Coding Agents — Architecture & Vendor Authoring

This page is for contributors. If you just want to send Claude Code or Cursor telemetry into OpenLit, the Onboarding page is what you want. Read on if you’re:

Adding a new vendor adapter (Claude Code, Codex, Windsurf, …).
Touching the rollup / materializer that powers the /agents and /coding-agents views.
Debugging missing rows, empty pills, or doubled sessions.

This doc is paired with the in-repo Cursor rule .cursor/rules/coding-agents-hook.mdc. The rule auto-loads when you edit anything under cli/internal/coding/, sdk/go/semconv/, or the coding-agents query layer; the rule is the canonical authoring checklist, this doc is the conceptual overview.

Data flow

Vendor hook event ──► openlit CLI ──► OTLP/HTTP ──► OpenLit collector
                          │                              │
                          ▼                              ▼
                  sessionstate cache              otel_traces (ClickHouse)
                  (XDG cache dir)                        │
                                                        ▼
                                          materializer (cron)
                                                        │
                                                        ▼
                                          openlit_agents_summary
                                                        │
                                                        ▼
                                          /agents + /coding-agents UI

Each developer machine runs one openlit CLI binary. The vendor’s plugin / hook system invokes openlit hook <vendor> on every event; the CLI is short-lived (one process per event) so anything stateful lives in the session-state cache at $XDG_CACHE_HOME/openlit/sessions/<session-id>.json.

Identity model

Concept	Where it lives	Stable across
`session.id`	Span + resource attr `coding_agent.session.id`	One process / hook invocation
`conversation.id`	Resource attr `gen_ai.conversation.id`	Multiple sessions when the vendor reports a stable chat id
`chat_id` (UI)	Derived: `coalesce(parent_id, session_id)`	One chat thread including its subagents
`agent_key`	Computed `computeAgentKey(cluster, env, vendor)`	One per vendor per environment
`user`	Resource attr `gen_ai.user.name`	One developer per machine

The UI rolls up at chat_id. The materializer rolls up at chat_id, then by vendor. The hub on /agents shows one row per vendor.

Capture modes

OPENLIT_CODING_CONTENT_CAPTURE controls what lands on spans. Pick one:

Mode	Identifiers	Tool names + paths	File diffs / message bodies
`minimal`	✅	❌	❌
`metadata_only`	✅	✅	❌
`full`	✅	✅	✅

Two redaction tiers always run, regardless of mode: a token-pattern scrubber (tier 1) and a body-scope scrubber (tier 2, only active in full). See cli/internal/redact/redact.go. The mode is stamped as a resource attribute coding_agent.content_capture_mode so audit trails can prove what the session was recorded under.

Vendor adapter contract

Each adapter under cli/internal/coding/hook/<vendor>/ is responsible for:

Parsing the vendor’s JSON payload into Go structs.
Picking the session id (vendor-specific key name).
Mapping the vendor’s event semantics onto our normalize types:
- Session — sessionStart / sessionEnd lifecycle.
- LLMTurn — one LLM call (prompt + completion).
- ToolCall — one tool invocation by the agent.
- EditDecision — accept / reject / undo of a code edit.
- ShellRequest — agent-issued shell command.
- Event — generic counter or “something happened” marker.
Calling the matching emit.Emit* method on the OTLP emitter.

The adapter does not decide what to redact, what to drop, or what trace id to use. Those live in the OTLP layer (cli/internal/otlp/{attrs,exporter,sampler,tracecontext}.go) so adapters stay narrow and consistent.

Common pitfalls (and what to do instead)

Empty `/agents` “Coding Agents” tab

Cause: the materializer’s discovery query has an aggregate in GROUP BY, or a CTE that inlines into nested aggregates. The materializer’s /api/agents/materialize route returns processed: N for other rows while the coding-vendor row silently errors. Fix: rewrite the discovery query to use a per-row chat_id expression (map lookups + coalesce, no any()) and group by it directly. See discoverCodingAgents in src/client/src/lib/platform/agents/materialize.ts.

Empty Repository / Working Folder pills on tool / llm spans

Cause: Cursor only sends workspace_roots on session lifecycle events. Per-tool hooks come in without any cwd field, and os.Getwd() wasn’t persisted to the session-state cache. Fix: in hook.go, when cached.CWD == "", resolve os.Getwd() and write it back to cached.CWD before calling git.Snapshot. The save gate already covers cached.CWD != "".

One chat appears as N sessions

Cause: a subagent is reporting its own session_id but not the parent’s parent_conversation_id, so the chat-thread coalesce falls through to per-process session_id. Fix: confirm peekContext is picking the vendor’s parent-id field, and that it’s promoted to the coding_agent.agent.parent_id resource attribute in the sessionAttrs block of hook.go.

Double session-root span

Cause: EmitSession is emitting a root span on both sessionStart (“started”) and sessionEnd (outcome=“completed” / “errored”). Fix: only emit on End events. The started / in_progress outcomes are no-ops in EmitSession — they only update the session-state cache.

`coding_active_users_24h` is always 0 on a single-user install

Cause: COHORT_K_FLOOR was applied at materialize time. Since the single developer is below the floor, the count is masked to 0 in the materialized table. Fix: don’t apply the floor in the materializer. Store the raw count. Apply the floor at query time in queries.ts where auth context is available, so admins see the truth and viewers see the masked view.

`active_users_24h` shows 0 in queries even after fix

Cause: USER_EXPR doesn’t include the vendor’s identity key. For Cursor, identity arrives as user.email on tool spans and gen_ai.user.name as a resource attribute on every span. Fix: confirm the resource attribute is set (check sessionstate/<sid>.json for a user field). If not, the vendor’s payload didn’t include identity and we need to widen peekContext’s key list.

Agent detail page renders the SDK layout for a coding agent

You click into a Cursor (or Claude Code, …) row in the Coding Agents tab and instead of the Overview / Sessions / Users tabs you get the generic Overview / Dashboard / Monitoring / Definition / Configuration tabs from the SDK/Controller detail page. Refresh a few times and the layout flips back. Cause: two rows exist in openlit_agents_summary with the same agent_key but different source — one coding and one sdk. The CLI sets service.name = '<vendor>' and emits through the openlit-go SDK, so the SDK discovery in materialize.ts:discoverAgents picks the same service.name up unless explicitly excluded. The summary table is a ReplacingMergeTree(updated_at), so FINAL on read picks whichever row was written most recently — and the detail page chooses its layout from agent.source. Fix: the SDK discovery query must filter out CLI rows by checking ResourceAttributes['coding_agent.session.id'] AND SpanAttributes['coding_agent.session.id'] are both empty. When adding a new vendor, sanity-check that exactly one row exists per agent in the summary table:

SELECT agent_key, source, count() AS rows
FROM openlit_agents_summary
WHERE service_name = '<vendor>'
GROUP BY agent_key, source;

A single row with source='coding' is correct. Two rows (coding + anything else) means the exclusion isn’t catching the vendor’s spans — usually because the vendor’s CLI is emitting some spans before the session id is stamped as a resource attribute.

Claude Code: dual-path coalesce

Claude Code is the first vendor where users can send telemetry to OpenLit through two independent paths at the same time:

Hook path (recommended primary). The openlit-cc plugin sits in ~/.claude/plugins/openlit-cc/ and is invoked on every SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop, SubagentStop, and SessionEnd hook event. The adapter in cli/internal/coding/hook/claudecode/handle.go normalises each event into our canonical coding_agent.* / gen_ai.* schema. It tails the transcript JSONL for authoritative token + cost numbers at session end.
Native OTel path (optional, complementary). When the user exports CLAUDE_CODE_ENABLE_TELEMETRY=1 (plus the usual OTLP endpoint / headers), Claude Code itself emits metrics, events, and — with CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1 — spans, all under service.name=claude-code and the claude_code.* attribute namespace.

The two paths overlap. OpenLit dedupes per session.id at read time. The query layer (queries.ts, materialize.ts) coalesces both schemas into one chat row:

Cost / tokens: native wins when present (authoritative Anthropic API numbers). Hook tail-of-transcript is the fallback estimate.
Repository / working folder / branch: hook wins. Claude Code’s native exporter does not see the user’s cwd or run git.Snapshot(cwd); only the openlit hook does.
User identity: either path. When the user is OAuth-authenticated in Claude Code, the native path supplies user.email as a resource attribute; the hook path supplies gen_ai.user.name from the local env. Both are coalesced into the canonical user column.
Vendor identifier: any of coding_agent.client = 'claude-code' (stamped by hook), gen_ai.agent.name = 'claude-code', or service.name = 'claude-code' (native default). The materializer recognises all three.

Every adapter stamps coding_agent.signal_source = "hook" on its resource attributes. The native path leaves the attribute unset; the query layer infers "native" from service.name = 'claude-code'. The SDK discovery in materialize.ts:discoverAgents excludes any span that carries coding_agent.session.id AND telemetry.sdk.name = 'openlit', so the openlit hook spans are never double-materialized as SDK rows. Native Claude Code spans use Claude Code’s own SDK name (not openlit), so they too are excluded from SDK discovery.

Native Claude Code metric mapping

Claude Code’s native exporter emits a stable set of metrics under the claude_code.* namespace. The openlit CLI emits the same observable behavior under the canonical coding_agent.* namespace so a single OpenLit installation can ingest both streams interchangeably and any custom Grafana / Mimir dashboard the operator already has continues to work. The mapping:

Native Claude Code metric	Canonical openlit-CLI metric	Notes
`claude_code.session.count`	`coding_agent.session.count`	Counter; vendor-tagged via `coding_agent.client`. Native path is unitless; openlit-CLI counter is `{session}`.
`claude_code.code_edit_tool.decision`	`coding_agent.edit.decision.count`	Counter, tags include `coding_agent.edit.decision` (`accept` / `auto_accepted` / `reject`) and `coding_agent.edit.source`.
`claude_code.lines_of_code.count`	`coding_agent.lines_of_code.count`	Counter, tags include `coding_agent.edit.decision` so the same metric powers “lines added” and “lines accepted vs rejected” without two emitters.
`claude_code.commit.count`	`coding_agent.commit.count`	Counter; the openlit-CLI adapters detect commits from Bash / shell / local_shell hooks and stamp the same canonical tags.
`claude_code.pull_request.count`	`coding_agent.pull_request.count`	Counter; gated on `gh pr create / gh pr new` detection so it survives copy-pasted aliases.
`claude_code.token.usage`	`gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens`	OpenLit follows the OTel GenAI semantic-conventions for token usage. Tags: `gen_ai.system`, `gen_ai.request.model`.
`claude_code.cost.usage`	`gen_ai.usage.cost`	OTel GenAI semantic-conventions counter, USD.
`claude_code.active_time.total`	(derived)	Active time is rolled up at query time as `duration_ms = ended_at - started_at` on the session-root span; we deliberately don’t emit it as a separate counter because it is recoverable from the span timings.

All canonical counters are emitted via the openlit-go MeterProvider in cli/internal/otlp/metrics.go and pass through the same OTLP exporter as the spans, so a Prometheus / Mimir / Datadog backend attached to the OpenLit collector sees the same data as the trace backend. Does coalesce-at-read scale? Yes for current OpenLit volumes (under ~1M coding-agent spans per tenant per day). ClickHouse evaluates the per-row coalesce in a few milliseconds; the query bottleneck is the wider session scan, not the attribute lookup. Above that volume we’ll ship a coding_agent_sessions materialized view that normalises both schemas at ingest. That deferral is documented inline in queries.ts next to the E1 deferral note.

Adding a new vendor

The Cursor, Claude Code, and Codex adapters are the three reference implementations. Each one models a different vendor shape:

Cursor — IDE extension; 13+ named hook events, no per-session transcript file, prompt/response/thought arrive as separate hooks.
Claude Code — CLI / IDE plugin; richer hook events plus a per-session JSONL transcript that’s the authoritative source for per-turn LLM content + token usage. The adapter streams the transcript with byte-offset bookkeeping.
Codex — CLI; per-turn hook protocol with turn_id on every event and a separate rollout JSONL under ~/.codex/sessions/YYYY/MM/DD/ that carries the token_count events. The adapter accumulates a per-turn fragment in sessionstate and drains it on Stop.

To add another vendor (Windsurf / OpenCode / …):

Read the convention rule — .cursor/rules/coding-agents-convention.mdc. It defines the strict attribute matrix every adapter must follow. New vendors must NOT introduce a new namespace for facts that already have a canonical key.
Create the adapter package under cli/internal/coding/hook/<vendor>/. Map the vendor’s hook payload keys onto the canonical coding_agent.* / gen_ai.* schema. Use setStr(..., scrub) for any string — never bypass the redaction layer.
Extend peekContext in hook.go so the vendor’s session id, user, cwd, permission mode, model, and parent-conversation id resolve from the payload. The function is intentionally a thin string-pick — add new key names to the pickString calls.
Wire the plugin manifest under plugins/<vendor>/. Mirror plugins/claude-code/ for Anthropic-style hooks or plugins/cursor/ for IDE-extension-style hooks. Add a marketplace entry in plugins/.claude-plugin/marketplace.json.
Bound transcript reads with tailfile.Tail if the vendor exposes a per-session log file. Never read an unbounded file in a hook process.
Stamp gen_ai.system correctly: anthropic for Claude Code, openai for Codex, google for Gemini-backed agents. Use the inferProvider(model, vendor) helper.
Honor capture modes — drop bodies in minimal and metadata_only. The bodyAllowed(mode) and setStr(..., scrub) helpers in cli/internal/otlp/attrs.go are mandatory.
Add UI icon + label: register an inline SVG in src/client/src/components/svg/coding-agents.tsx and add a switch case in CodingAgentVendorIcon AND hasCodingAgentVendorIcon. The existing call sites pick the icon up automatically once the helpers know about the vendor.
Add the onboarding snippet to docs/features/coding-agents/onboarding.mdx. Document any vendor-specific env vars (OTEL_* overrides, transcript paths, dual-path notes if applicable).
Sanity-check end to end:
- Fire a real session through the vendor.
- Confirm otel_traces has spans with SpanAttributes['coding_agent.client'] = '<vendor>'.
- Trigger the materializer (curl -X POST http://localhost:3000/api/agents/materialize -H 'X-CRON-JOB: true').
- Confirm a row lands in openlit_agents_summary with coding_agent_vendor = '<vendor>' and source = 'coding'.
- Open /agents, click into the row, verify the Sessions and Users tabs render without errors and that the trace-detail pills (Repository, Working Folder, Branch, Mode, Terminal) populate.

Threat model & accepted risks

The CLI runs on the developer’s machine alongside the coding agent. Anyone who can run openlit coding hook on that machine can emit arbitrary coding_agent.* spans into the configured collector — including spans claiming to be from a different session, vendor, or user. We accept this:

Spoofing scope is bounded by who already has shell access. A developer who can run the hook can equally well run any other process under their own user. There is no privilege boundary the CLI is the only gate on.
The OTLP endpoint authenticates the host, not the event. The Authorization header carries an org-scoped API key, so spans reaching the backend are at least “from some machine our org authorised”. Per-event signing is intentionally out of scope for v1 — the cost (key distribution, rotation, hook-latency budget) outweighs the benefit for the threat model of in-org telemetry.
Materialization runs k=5 cohort floors so a single user cannot inflate their own visible metrics by spamming events; the per-user page refuses to render below five sessions.
Repos and API keys drive the personal-vs-work classifier, not hook content. A user cannot relabel their own personal activity as work by tampering with hook payloads.

Operators that need stricter guarantees (e.g. an untrusted developer fleet) should:

Front the OTLP endpoint with a collector that drops events whose coding_agent.client.user.id doesn’t match the authenticated key’s owner.
Disable the hook tier and rely on vendor-native OTel only (Claude Code’s CLAUDE_CODE_ENABLE_TELEMETRY=1 path) where the vendor signs its own egress.

Where to read next

.cursor/rules/coding-agents-convention.mdc — the strict schema contract every vendor must follow. The attribute matrix, the Cursor + Claude Code inbound mappings, and the dual-path coalesce rules all live here.
.cursor/rules/coding-agents-hook.mdc — canonical authoring rules
- failure modes, auto-loaded by Cursor when editing hook code.
cli/internal/coding/hook/cursor/handle.go — reference adapter.
cli/internal/coding/hook/claudecode/handle.go — second reference adapter, demonstrates dual-path-aware design.
cli/internal/coding/normalize/normalize.go — schema all adapters emit into.
cli/internal/otlp/attrs.go — capture-mode gating and scrubbing.
src/client/src/lib/platform/agents/materialize.ts — rollup that populates openlit_agents_summary.
src/client/src/lib/platform/coding-agents/queries.ts — Sessions, Users, Overview query layer.

Getting Started

Observability

Evaluations

Dashboards

Otter (AI Copilot)

Pricing & Models

Prompts and Experiments

Developer Resources

Coding Agents — Architecture & Vendor Authoring

Data flow

Identity model

Capture modes

Vendor adapter contract

Common pitfalls (and what to do instead)

Empty `/agents` “Coding Agents” tab

Empty Repository / Working Folder pills on tool / llm spans

One chat appears as N sessions

Double session-root span

`coding_active_users_24h` is always 0 on a single-user install

`active_users_24h` shows 0 in queries even after fix

Agent detail page renders the SDK layout for a coding agent

Claude Code: dual-path coalesce

Native Claude Code metric mapping

Adding a new vendor

Threat model & accepted risks

Where to read next

​Data flow

​Identity model

​Capture modes

​Vendor adapter contract

​Common pitfalls (and what to do instead)

​Empty /agents “Coding Agents” tab

​Empty Repository / Working Folder pills on tool / llm spans

​One chat appears as N sessions

​Double session-root span

​coding_active_users_24h is always 0 on a single-user install

​active_users_24h shows 0 in queries even after fix

​Agent detail page renders the SDK layout for a coding agent

​Claude Code: dual-path coalesce

​Native Claude Code metric mapping

​Adding a new vendor

​Threat model & accepted risks

​Where to read next

Data flow

Identity model

Capture modes

Vendor adapter contract

Common pitfalls (and what to do instead)

Empty `/agents` “Coding Agents” tab

Empty Repository / Working Folder pills on tool / llm spans

One chat appears as N sessions

Double session-root span

`coding_active_users_24h` is always 0 on a single-user install

`active_users_24h` shows 0 in queries even after fix

Agent detail page renders the SDK layout for a coding agent

Claude Code: dual-path coalesce

Native Claude Code metric mapping

Adding a new vendor

Threat model & accepted risks

Where to read next