Skip to main content
Rohit Raj
StartseiteProjekteServicesReposNotizenÜber michKontaktAktuelle Arbeit
← Back to Notes

Claude AI Vulnerability Scanner: Anthropic's Open-Source Code-Security Harness (2026)

Rohit Raj·June 5, 2026·12 min read

Anthropic open-sourced defending-code-reference-harness — a Claude-powered pipeline that finds and patches security bugs in your code — and it hit the GitHub Trending front page this week. Here's what actually shipped, how to run /vuln-scan on your own repo, how it compares to the claude-code-security-review Action, managed Claude Security, and Snyk/Semgrep/CodeQL, where it quietly breaks, and how I'd wire it into a production CI without burning your token budget.

claude ai vulnerability scannerai code vulnerability scanneranthropic defending code reference harnessai security scanning for developers
Dark editorial cover with a cracked circuit trace illustrating an AI vulnerability scanner finding code security bugs with Claude in 2026

TL;DR

Anthropic open-sourced **defending-code-reference-harness** — a Claude-powered pipeline that finds, verifies, and patches code vulnerabilities — and it hit ~1.3k GitHub stars the week of June 4, 2026. It pairs file-safe skills (/vuln-scan, /triage, /patch) with a 7-stage autonomous loop, each Claude agent gVisor-sandboxed. The headline stat from Anthropic's own data: 1,596 vulnerabilities found, only 97 patched by May 22, 2026 — discovery is solved; triage and fixing are the bottleneck. Run /vuln-scan on your repo today; never let autonomous patching auto-merge.

Claude AI Vulnerability Scanner: Discovery Is Cheap, Fixing Is Not

By Rohit Raj — Founding Engineer · 10+ yrs MVP shipping · LinkedIn

Security teams using Claude disclosed 1,596 vulnerabilities by May 22, 2026. They patched 97. That gap — roughly 6%, straight from Anthropic's own engineering write-up (May 27, 2026) — is the most honest number in AI security right now, and it reframes the whole conversation. AI made *finding* bugs almost free and trivially parallel; it did nothing to make *fixing* them faster.

That gap is also why Anthropic's freshly open-sourced **defending-code-reference-harness** — a reference pipeline for using Claude to find and patch vulnerabilities — shot to ~1.3k GitHub stars the week of June 4, 2026. When a vulnerability scanner, not a frontier model, is the week's breakout repo, the AI-security question has clearly moved from "can LLMs find bugs?" to "what do we do with the firehose?" The harness is Anthropic's attempt to answer that, and the candor baked into it is exactly why it earns a working developer's attention.

Below: exactly what shipped, how to point /vuln-scan at your own repository, how the open-source harness stacks up against the claude-code-security-review GitHub Action, the managed Claude Security product, and traditional scanners like Snyk and Semgrep — plus where this quietly breaks and how I'd wire it into a real production pipeline.

What Anthropic Actually Open-Sourced

Strip the launch noise and here is the concrete surface area, from the repository README:

  • Two modes, different risk levels. A set of interactive skills — /threat-model, /vuln-scan, /triage, /patch, /customize — that only read and write files (safe to run unsandboxed inside Claude Code), and a separate autonomous harness that actually executes code and therefore requires a sandbox.
  • A 7-stage autonomous loop: Build → Recon → Find → Verify → Dedupe → Report → Patch. Recon partitions the source into input-parsing subsystems; *N* parallel agents craft malformed inputs and run until they reproduce a crash 3 out of 3 times; a separate grader agent re-runs each crash in a fresh container before it counts.
  • Sandboxed by default. Every agent runs in a gVisor-isolated container with network egress allow-listed to the Claude API only — so a scanner agent can't exfiltrate your code.
  • Ships for C/C++ memory bugs (Docker + AddressSanitizer out of the box) but is explicitly language- and vuln-class-agnostic — the /customize skill rewrites the pipeline for your stack.
  • Bring your own Claude access: works against the Claude API, Amazon Bedrock, Google Vertex, or Azure; the subagent model is set via CLAUDE_CODE_SUBAGENT_MODEL.
  • It's a reference implementation — Python (92.7% of the repo), not maintained and not accepting contributions. You fork it and own it.

The design choice that matters most is adversarial verification. Anthropic reports that adding an independent agent to disprove each finding "roughly halved the rate of non-exploitable findings," and a team that required a working proof-of-concept before reporting drove false positives to "near zero." That is the difference between an AI scanner you'll actually use and one you'll mute after a week of noise. It's the same multi-agent, verify-before-you-trust pattern I dug into in Claude Code dynamic workflows — here it's pointed at your attack surface instead of your feature backlog.

How Do You Run /vuln-scan on Your Own Repo?

You do not need the full sandbox to get value on day one. The interactive skills are file-only and run inside Claude Code, so the fastest first pass is three commands:

bash
git clone https://github.com/anthropics/defending-code-reference-harness
cd defending-code-reference-harness
claude            # open Claude Code in the repo

# 30-second guided run against the bundled "canary" target
> /quickstart

# Then point the same skills at your own code:
> /threat-model bootstrap ~/code/my-service
> /vuln-scan ~/code/my-service
> /triage ~/code/my-service/VULN-FINDINGS.json

That sequence does threat-modeling, a static scan, and triage without executing anything — read/write files only. It's the part I'd run first on any client codebase because it's zero-risk and surfaces the obvious data-flow and access-control issues fast.

When you're ready for execution-verified findings (real crashes, not pattern matches), you opt into the autonomous harness:

bash
python3 -m venv .venv && .venv/bin/pip install -e .
./scripts/setup_sandbox.sh          # one-time: installs gVisor, builds agent images
export ANTHROPIC_API_KEY=sk-ant-...

# recon -> find -> verify -> report, 3 runs in parallel
bin/vp-sandboxed run my-service --model <model-id> --runs 3 --parallel --stream --auto-focus

# generate candidate patches from the verified findings
bin/vp-sandboxed patch results/my-service/<timestamp>/ --model <model-id>

The thing the quickstart won't tell you: cost and time concentrate in the autonomous `find` and `patch` stages, because that's where you're paying for *N* parallel agents to fuzz and re-fuzz. The interactive /vuln-scan is cheap; the full harness on a large target is not. Scope it to a subsystem with --auto-focus before you turn it loose on a monorepo — the same token-budget discipline I argued for in LLM context compression.

Where Does an AI Vulnerability Scanner Earn Its Keep?

This is not a blanket replacement for your existing scanners. It pays off in three specific shapes of work.

1. Context-dependent bugs that pattern matchers miss. Traditional SAST (Snyk, Semgrep, CodeQL) is excellent at known signatures — a hardcoded secret, a SQL string concatenation, a vulnerable dependency version. It is weak at business-logic flaws, broken access control, and unsafe data flows that span multiple files. Claude reasons about those the way a reviewer does. On a fintech build like myFinancial, the bugs that scared me were never the ones a regex catches — they were "this endpoint trusts a user-supplied account ID three functions deep," and that's exactly the class an LLM scanner is built to find.

2. Triaging a backlog you already have. If you've ever run a commercial scanner and gotten 400 "findings," you know the real work is deciding which 12 are real. The harness's /triage skill with multi-vote confirmation (--votes 5) is genuinely useful *on findings you already have* — point it at an existing SARIF/JSON export and let it rank exploitability and kill the false positives.

3. C/C++ and memory-unsafe code. The out-of-the-box pipeline targets memory bugs with ASAN, which is the highest-stakes, hardest-to-audit category. If you maintain a parser, a codec, or any native library, this is the configuration that ships ready to use.

The thread through all three: it shines when the bug requires reasoning about intent, not matching a known bad string. For dependency CVEs and secret detection, your existing tools are faster and cheaper — keep them.

Harness vs GitHub Action vs Claude Security vs Snyk/Semgrep

Anthropic shipped *three* security things in the same window, and they're easy to confuse. Here's the honest split, including the traditional scanners you probably already run:

ToolTypeRuns whereCostBest forMain tradeoff
defending-code-reference-harnessOpen-source reference pipelineYour machine / CI, self-hostedFree + Claude API tokensDeep audits, custom stacks, execution-verified findingsYou fork and maintain it; not supported
claude-code-security-reviewFree GitHub ActionCI on every PRFree + API tokensDiff-scoped review on pull requestsScans the change, not the whole repo
Claude Security (managed)Hosted product, Claude Opus 4.7Anthropic cloud / Claude Code on webPaid (Enterprise public beta)Teams that want scanning without owning a pipelineClosed beta; per-seat cost; less control
Snyk / Semgrep / CodeQLTraditional SAST + SCACI, IDEFree tier → paidDependency CVEs, secrets, known patterns, complianceMisses multi-file logic flaws; noisy on novel bugs

#### Claude Security vs Snyk — do you replace your scanner?

No — you layer it. Per The New Stack, managed Claude Security (built on Opus 4.7 — I compared that model's tradeoffs in Opus 4.8 vs 4.7) re-examines every finding to prove or disprove it before showing you, which is the verification layer Snyk lacks. But Snyk's dependency graph and license scanning are things the LLM doesn't do. The right 2026 stack is traditional SAST for known-pattern coverage + an AI scanner for the reasoning-heavy bugs — not one or the other.

When Should You Skip (or Gate) This?

Because the discovery side works so well, the failure modes all live downstream — which is exactly where Anthropic is most candid.

Autonomous patching is not production-ready. Anthropic's own write-up notes models generate inconsistent patches and that one team's fixes were "as restrictive as possible, to the point that they would break connections." A patch that closes a hole by breaking a feature is a regression with a security excuse. Never let `/patch` auto-merge — treat every generated fix as a draft PR for human review. This is the same invisible-failure trap I wrote about in AI-generated code anti-patterns: the code looks right and is wrong in a way tests don't catch.

Severity inflation is real. Without an understanding of your threat boundaries and compensating controls, the model "inflates severity." A finding it scores critical may be unreachable behind auth you didn't describe — which is why the /threat-model step isn't optional decoration; it's what calibrates everything after it.

It assumes you can sandbox. The autonomous loop needs Docker + gVisor. On a locked-down corporate laptop or a CI runner you don't fully control, you may be limited to the interactive (file-only) skills — still useful, but not the execution-verified mode.

Token cost scales with thoroughness. Three parallel runs fuzzing a large target is real money. If your need is "block the obvious stuff on every PR," the lightweight GitHub Action or your existing SAST is the better-fit, cheaper tool. Reach for the full harness for *audits*, not for *every commit*.

How I'd Wire This Into a Production Pipeline

Here's the concrete way I'd actually adopt this on a client build, not the demo version.

Split it by risk, not by hype. The file-only /threat-model + /vuln-scan skills go in early and often — they're cheap and safe. The autonomous execution harness runs as a scheduled audit (weekly, or pre-release), never inline on every commit. That keeps the token bill predictable and matches each mode to its real cost.

Gate the diff, not the repo, on PRs. For per-PR coverage I'd run the claude-code-security-review Action scoped to the changed files — scanning the whole monorepo on every push is how you burn budget and train the team to ignore the bot. Diff-scoped + a required-check status is the integration that actually changes behavior.

Make a human own every patch. I wire /patch output into a *draft* PR with the original proof-of-concept attached, assigned to a person. The 1,596-found / 97-patched gap is the warning: the bottleneck is human review capacity, and pretending an LLM closes that gap is how you ship a broken "fix." On a regulated build — payments, health, anything in fintech — that human gate is non-negotiable.

Treat it as a layer, log everything. It sits *alongside* Snyk/Semgrep in CI, not instead of them, and I persist every finding + verdict so the false-positive rate is measurable over time. Security tooling you can't measure is security theater. This is the same production-hardening mindset I bring to securing MCP servers — the quickstart gets you a demo; the wiring gets you something you can trust on a real codebase.

If you want this kind of security-and-reliability engineering built into your product from day one instead of bolted on after an incident, that's the work I do: I run fixed-scope 6-week MVP builds, or you can hire a founding engineer in India to own the whole pipeline end to end.

RELATED PROJECT

View MyFinancial →

Ship an AI Product With Security Built In, Not Bolted On

Let's Talk →

Read Next

Cut LLM Token Costs Up to 90% with Context Compression (2026)

Headroom hit #1 on GitHub Trending on June 4, 2026 with a tool that compresses tool outputs, logs, a...

This Week in AI Dev: Claude Opus 4.8, Copilot Goes Token-Metered, MCP's Stateless Next Spec (Week 23 of 2026)

Six ships from Week 23 of 2026 that change how you build with AI: Claude Opus 4.8 lands, GitHub Copi...

← All NotesProjects →

Rohit Raj — Backend & KI-Systeme Ingenieur

Services

Founding Engineer for Hire in IndiaMobile App DevelopmentAI Chatbot DevelopmentFull-Stack Development

Updates Erhalten