Exfil through an approved tool: the GitHub MCP toxic flow
A public-repo issue tells an agent to go read the user's private repos and publish what it finds. The agent obeys, opening a public pull request full of private data. Every tool it used was one the user approved. A tool-name allowlist passes all of it; the only thing Herkos adds is a signed record of exactly what happened.
Disclosed: Invariant Labs, May 26 2025. Target: the official github/github-mcp-server (over 14,000 stars at the time of disclosure). CVE: none - Invariant framed it as an architectural problem, not a server bug to patch. Vendor response: none that we could find; a community member later filed a reproduction issue (#844, Aug 2025) which went stale with no official reply. Sources: Invariant Labs, Toxic Flow Analysis.
What happened
The attack is a "toxic agent flow": one agent session that combines ingesting untrusted content, access to private data, and an external write channel. Any one is fine; the combination in a single trust context is the hole.
- Plant the payload. The attacker opens an issue in a public repo (the PoC used
ukend0464/pacman) whose body contains injected instructions telling the agent to gather and publish the user's other repository data. - Victim triggers the agent benignly. The user runs an agent wired to the GitHub MCP server with a token spanning both their public and private repos, and gives an innocuous prompt like "check the open issues in this repo."
- Untrusted content enters context. The issue-listing tool returns the malicious issue body verbatim. There is no trust boundary between attacker-authored text and the user's instructions.
- Injection fires. The agent treats the injected text as instructions and pivots from "triage issues" to "collect data across the user's repositories."
- Cross-repo read. With the same broad token, the agent reads private repositories the attacker cannot see. The PoC pulled a private repo ("Jupiter Star"), a plan to relocate to South America, and salary information into context.
- Exfil via a public write. The agent auto-creates a pull request in the attacker-visible public repo containing the harvested private data.
- Attacker reads the leak. The PR is public, so the attacker just reads it - never touching the private repo directly.
Invariant was explicit that this "is not a flaw in the GitHub MCP server code itself, but rather a fundamental architectural issue that must be addressed at the agent system level." It is client-agnostic (the PoC used one desktop agent, but any agent on the server is exposed). In fairness, some practitioners argued it is classic indirect prompt injection plus an over-broad token rather than a novel GitHub bug - which is a reason to fix it at the agent level, not to dismiss it.
Where Herkos helps, and where it does not
- Broker: does NOT prevent.
create_pull_requestis a tool the user legitimately allowed. A tool-name allowlist passes it. Prevention here needs per-task data scoping and least-privilege tokens (Invariant's own advice: one repo per session), not a name check. - Content gate: partial at best. With a served set pinned, Herkos blocks tool-call arguments carrying verbatim repo lines from outside it. It normalizes case and whitespace first, so a reflow or recase of a served line still trips it - but the leaked data here is repo names and facts, not necessarily served lines, and encoding or paraphrase defeats a verbatim match. Do not lean on it for this.
- Receipt: strong. This is the real value. The audit log records that
create_pull_requestran, with the request hashed, signed, and chained. After a leak you can prove exactly what left and when. Forensics, not a force field.
Here is the receipt
A real run of this repo's binary: the broker records the create_pull_request call into a signed, hash-chained log, then seals it. The log verifies offline with only the public key, flags truncation, and fails under the wrong key.
# the broker recorded: open(seq 0) -> call(seq 1, create_pull_request) -> close(seq 2) $ herkos verify --file audit/<session>.jsonl --pubkey 685e6104...ef87 VERIFIED session=233b02ca28be2b28f0c1a45b84fc20fb calls=1 cleanly closed tip=4638a19e... # drop the signed close line to fake a shorter history: $ herkos verify --file audit/truncated.jsonl --pubkey 685e6104...ef87 INCOMPLETE session=233b02ca... calls=1 not cleanly closed (possible truncation) (exit 1) $ herkos verify --file audit/<session>.jsonl --pubkey 0000...0000 FAILED: receiptlog: line 0: signature invalid (exit 1)
Prevention is aspirational and belongs to agent-system design: scope data access per task, separate untrusted-input ingestion from privileged actions, use least-privilege tokens. Herkos does none of that. What it gives you is the one thing missing after an incident like this - a tamper-evident, offline-verifiable record of which tool ran and what context was present. Audit is real; the force field is not.