Security model
A security tool that hides its gaps is worse than no tool, because you trust it more than you should. This page states what Herkos enforces today, what it does not, and the bypass we publish ourselves.
We will not tell you "your code never leaves" as an unqualified absolute. Herkos in userspace mode gives you deny-by-default tool control and a signed audit trail. It is not a kernel-enforced seal. Where that distinction matters, it is spelled out below.
Pin what your agent sees. That is all that can leave. The spans Herkos serves are the same set it allows out: with a served set pinned, repo lines from outside that set are blocked on the way to a tool call. The match normalizes case and whitespace first, so a recase or a reflow of a served line still trips it. This is a userspace tripwire, not a sandbox - encoded or paraphrased exfil, or a line split across calls, can still slip past, and a kernel-enforced boundary (Landlock, seccomp, eBPF) is next on the roadmap.
Threat model
Herkos is built for a setup where you run an AI coding agent against MCP servers you do not fully trust. The adversaries it has in mind:
- Untrusted or compromised MCP servers. A server you wired up may be over-scoped, may have a poisoned tool description that nudges the agent into leaking, or may try to exfiltrate through tools you never meant to expose.
- Over-broad context. The agent hauls more of your source to the model than the query needs, so you pay for tokens and widen the blast radius of any leak.
- No record. Without a receipt, you cannot say afterward which lines of your code touched the model.
What Herkos is not designed to stop: a fully malicious agent binary running with your privileges, kernel-level compromise of your machine, or a model endpoint you deliberately send data to. The leash constrains the agent's egress surface; it does not sandbox your whole OS.
What v1 enforces
- Deny-by-default tool control. The in-path broker (
herkos serve) passes only thetools/callwhose tool name you allowed with--allow-tool. Anything else is blocked in-path and answered with a JSON-RPC error, and the session keeps running. - A signed audit trail. Every answer ships a signed Merkle receipt of which spans touched the model, verifiable locally with your own ed25519 key.
- A context-bound call log.
serve --receiptswrites a per-call, ed25519-signed, sha256 hash-chained record of every brokeredtools/call- the tool name, a hash of the request, and the allow/deny decision. It is fail-closed and offline-verifiable with only the public key;herkos verifydetects any edit, reorder, truncation, or wrong key. With a served set pinned, the log's opening record commits a fingerprint of the served context, so the receipt proves which context-egress binding was in force. This is the per-call decision log, separate from the Merkle receipt of which spans touched the model. - Config auditing.
herkos scanflags over-scoped tools, poisoned tool descriptions, and servers with unrestricted egress, with a receipt you can diff against a baseline. - Dual-use egress gate (opt-in). Pin the spans the model may see with
--served-spanand Herkos blockstools/callarguments that carry repo lines from outside that set. Matching normalizes case and whitespace, so a recase or a reflow of a served line still trips it. It is a userspace tripwire on line content, not a sandbox; the published bypass below says exactly what it does not catch.
The broker sits between the agent and the upstream server, gates each tools/call deny-by-default, and records every brokered call to a signed, hash-chained log:
The published bypass
Here is exactly how to get data past v1's in-path broker. We publish it because you should know the boundary before you lean on it.
- Hide data in an allowed tool's arguments. The broker gates the tool name only. It does not inspect parameters. An allowed tool can carry whatever you stuff into its args, and the broker will pass it.
-
Use a method that is not
tools/call. The in-path broker only gatestools/callin v1. Other JSON-RPC methods, includingresources/read, are not gated. - Encode out-of-set bytes before they leave. With a served set pinned, the content gate blocks tool-call arguments carrying repo lines from outside that set. It normalizes case and whitespace before matching, so a recase or a reflow no longer slips a served line past it. What still defeats the match: base64 or other encoding, a paraphrase or token-rewrite, or splitting a line across calls - and an unarmed broker (no served set) does not inspect arguments at all. The transformation-resistant, kernel-enforced boundary is the roadmap item, not this tripwire.
-
Open your own socket and go around the broker. The userspace broker sees what flows over MCP, not what a server dials out on its own. For a server that only needs stdio to Herkos,
serve --isolatecloses this: it launches the server in a fresh Linux network namespace with no interface but loopback, so it has no route to any host - unprivileged, and proven in the test suite (without--isolatethe same probe reachesHTTP 200; with it,000, no route). A server that legitimately needs its own egress runs unisolated and can still open a socket the broker cannot see; constraining that to a per-destination allowlist needs elevated privilege (eBPF), the remaining roadmap item. -
Truncate the audit log locally. The
serve --receiptslog is a signed hash-chain, so editing, reordering, or dropping a middle record breaksherkos verify. But a local attacker with write access can chop the most recent entries, and the shorter prefix still verifies cryptographically. Herkos makes this detectable: a truncated log is missing its signed close record, soverifyreports "not cleanly closed", andserveprints the tip hash on shutdown so you can anchor it out of band. It does not prevent local truncation; that needs an external transparency log.
serve --isolate adds a real kernel boundary for stdio-only servers; per-destination egress control for servers that need their own network is still coming.
What the audit log does hold up against: every brokered call is signed and hash-chained as it is written, and any tampered byte fails herkos verify offline with only the public key.
What closes the gap
The gaps above are not permanent; they are the next work. Two things close most of them:
| Work | What it adds |
|---|---|
| SpanGate payload path | Shipped as a userspace tripwire: with a served set pinned, repo lines outside it are blocked on the egress path (--served-span), matched after case and whitespace are normalized. What remains is provenance that survives deeper transformation (encoding, paraphrase, splitting a line across calls) so egress is enforced rather than approximated by a line match. |
| Hardened mode | Moves enforcement into the OS. First layer shipped: serve --isolate runs a server in a no-egress network namespace, unprivileged. Still to come: per-destination host allowlisting via eBPF, plus Landlock and seccomp - the kernel-enforced seal that userspace cannot be. |
When herkos scan prints "your code never left this machine", that is a true statement about that run: the scan read local config and made no network call. It is not a guarantee about every future agent session. Treat it as a receipt for one run, not a promise about all of them.
Reporting an issue
Herkos is local-first and open source under Apache-2.0, written in Go. If you find a way past the broker that is not already listed above, that is exactly the kind of finding worth filing. Open an issue on GitHub. The bar we hold ourselves to is to publish gaps before someone else has to.