Herkos / Docs / Case studies / GitHub toxic flow

Exfil through an approved tool: the GitHub MCP toxic flow

A public-repo issue tells an agent to go read the user's private repos and publish what it finds. The agent obeys, opening a public pull request full of private data. Every tool it used was one the user approved. A tool-name allowlist passes all of it; the only thing Herkos adds is a signed record of exactly what happened.

At a glance

Disclosed: Invariant Labs, May 26 2025. Target: the official github/github-mcp-server (over 14,000 stars at the time of disclosure). CVE: none - Invariant framed it as an architectural problem, not a server bug to patch. Vendor response: none that we could find; a community member later filed a reproduction issue (#844, Aug 2025) which went stale with no official reply. Sources: Invariant Labs, Toxic Flow Analysis.

What happened

The attack is a "toxic agent flow": one agent session that combines ingesting untrusted content, access to private data, and an external write channel. Any one is fine; the combination in a single trust context is the hole.

Plant the payload. The attacker opens an issue in a public repo (the PoC used ukend0464/pacman) whose body contains injected instructions telling the agent to gather and publish the user's other repository data.
Victim triggers the agent benignly. The user runs an agent wired to the GitHub MCP server with a token spanning both their public and private repos, and gives an innocuous prompt like "check the open issues in this repo."
Untrusted content enters context. The issue-listing tool returns the malicious issue body verbatim. There is no trust boundary between attacker-authored text and the user's instructions.
Injection fires. The agent treats the injected text as instructions and pivots from "triage issues" to "collect data across the user's repositories."
Cross-repo read. With the same broad token, the agent reads private repositories the attacker cannot see. The PoC pulled a private repo ("Jupiter Star"), a plan to relocate to South America, and salary information into context.
Exfil via a public write. The agent auto-creates a pull request in the attacker-visible public repo containing the harvested private data.
Attacker reads the leak. The PR is public, so the attacker just reads it - never touching the private repo directly.

Invariant was explicit that this "is not a flaw in the GitHub MCP server code itself, but rather a fundamental architectural issue that must be addressed at the agent system level." It is client-agnostic (the PoC used one desktop agent, but any agent on the server is exposed). In fairness, some practitioners argued it is classic indirect prompt injection plus an over-broad token rather than a novel GitHub bug - which is a reason to fix it at the agent level, not to dismiss it.

Where Herkos helps, and where it does not

Broker: does NOT prevent. create_pull_request is a tool the user legitimately allowed. A tool-name allowlist passes it. Prevention here needs per-task data scoping and least-privilege tokens (Invariant's own advice: one repo per session), not a name check.
Content gate: partial at best. With a served set pinned, Herkos blocks tool-call arguments carrying verbatim repo lines from outside it. It normalizes case and whitespace first, so a reflow or recase of a served line still trips it - but the leaked data here is repo names and facts, not necessarily served lines, and encoding or paraphrase defeats a verbatim match. Do not lean on it for this.
Receipt: strong. This is the real value. The audit log records that create_pull_request ran, with the request hashed, signed, and chained. After a leak you can prove exactly what left and when. Forensics, not a force field.

Here is the receipt

A real run of this repo's binary: the broker records the create_pull_request call into a signed, hash-chained log, then seals it. The log verifies offline with only the public key, flags truncation, and fails under the wrong key.

herkos verify - the audit log

# the broker recorded: open(seq 0) -> call(seq 1, create_pull_request) -> close(seq 2)
$ herkos verify --file audit/<session>.jsonl --pubkey 685e6104...ef87
VERIFIED  session=233b02ca28be2b28f0c1a45b84fc20fb calls=1 cleanly closed  tip=4638a19e...

# drop the signed close line to fake a shorter history:
$ herkos verify --file audit/truncated.jsonl --pubkey 685e6104...ef87
INCOMPLETE  session=233b02ca... calls=1  not cleanly closed (possible truncation)   (exit 1)

$ herkos verify --file audit/<session>.jsonl --pubkey 0000...0000
FAILED: receiptlog: line 0: signature invalid   (exit 1)

The honest boundary

Prevention is aspirational and belongs to agent-system design: scope data access per task, separate untrusted-input ingestion from privileged actions, use least-privilege tokens. Herkos does none of that. What it gives you is the one thing missing after an incident like this - a tamper-evident, offline-verifiable record of which tool ran and what context was present. Audit is real; the force field is not.

Edit on GitHub