pentest-ai: Turn Claude Code into Your Offensive Security Research Assistant

Most AI-assisted security tools work by wrapping an LLM around a set of existing tools and letting it issue commands autonomously. pentest-ai takes a different approach. Rather than building a new agent framework from scratch, it extends Claude Code — Anthropic’s terminal-based agentic coding tool — with six highly specialised subagents, each carrying deep domain expertise in a specific phase of a penetration testing engagement.

The result is something more like a team of specialists sitting alongside you in your terminal than a single automated system running off on its own. You describe what you need in plain language, Claude routes to the right agent automatically, and that agent responds with the kind of depth and specificity you’d expect from someone who’s spent years in that discipline. No commands to memorise. No configuration to fiddle with. Just describe the task.

What is pentest-ai?

pentest-ai is an open-source collection of Claude Code subagent definitions — markdown files with YAML frontmatter that describe an agent’s purpose, capabilities, and the context in which it should activate. Install them into your Claude Code agents directory and they become available automatically whenever you’re working in Claude Code. Claude reads each agent’s description and routes your query to whichever specialist is most appropriate for what you’re asking.

The project is built by 0xSteph and covers the full penetration testing lifecycle through six agents: engagement planning, reconnaissance analysis, exploit guidance, detection engineering, STIG compliance, and report generation. Each agent is grounded in recognised industry frameworks — MITRE ATT&CK, PTES, OWASP, NIST 800-115, CVSS v3.1, and DISA STIGs — so the output isn’t just plausible-sounding; it maps to the standards that professional engagements actually follow.

One thing worth flagging upfront: these agents provide methodology guidance and analysis. They don’t execute attacks, access systems, or generate functional exploit code. That distinction matters — pentest-ai is a research and planning assistant, not an autonomous offensive tool. If you need autonomous tool execution, that’s a different category (see our earlier posts on HexStrike AI and PentestGPT). pentest-ai’s strength is in structured thinking: planning, analysis, interpretation, and documentation.

The Six Agents

1. Engagement Planner

The engagement planner handles the upfront scoping and methodology work that experienced practitioners spend significant time on — and that beginners often skip to their detriment. Feed it a description of your target environment and it produces a structured, phased engagement plan with MITRE ATT&CK technique mappings, time estimates, and rules of engagement templates.

An example prompt: “Plan an internal network pentest for a 500-endpoint Active Directory environment with a 2-week window.” The agent responds with a full phased plan: reconnaissance, enumeration, exploitation, post-exploitation, lateral movement, and reporting — each phase mapped to specific ATT&CK techniques, with realistic time allocations and notes on tooling. You can see a real example of its output in the project’s examples directory.

For someone preparing for their OSCP or working through their first professional engagement, this is the kind of structured starting point that previously required either a senior mentor or hours of reading PTES and NIST documentation.

2. Recon Advisor

The recon advisor parses output from over 20 tools — Nmap, Nessus, BloodHound, Masscan, Nikto, Shodan, and more — and turns raw scan data into a prioritised attack plan with specific follow-up commands. Paste in your Nmap output and it identifies the highest-value targets, maps known CVEs to discovered services, and recommends exactly what to run next.

This is where pentest-ai saves the most manual effort for working practitioners. Cross-referencing scan output against CVE databases, determining which findings are exploitable versus informational, and deciding what to pursue first is time-consuming work that the recon advisor handles well. The Nmap analysis example shows a detailed breakdown with prioritised vectors and command suggestions.

3. Exploit Guide

The exploit guide covers offensive methodology across Active Directory attacks, web application vulnerabilities, cloud privilege escalation, and post-exploitation. What sets it apart is a mandatory design constraint: every technique includes the defensive perspective alongside the offensive methodology. Ask how to execute AS-REP Roasting and you get the attack steps, the exact Impacket commands, OPSEC considerations — and what defenders see when it’s happening, how they detect it, and what logs it generates.

This dual-perspective approach is genuinely useful whether you’re an attacker (understanding detection risk informs your approach) or a defender (understanding the attack methodology makes your detection engineering better). The scope is broad: Kerberoasting, DCSync, delegation attacks, OWASP Top 10 web vulnerabilities, API security issues, deserialization attacks, and AWS/Azure/GCP privilege escalation paths are all covered.

4. Detection Engineer

The detection engineer produces deployment-ready detection rules in multiple formats: Sigma (the vendor-neutral standard), Splunk SPL, Elastic KQL, Sentinel KQL, and YARA. Alongside the rules themselves, it provides false positive analysis, tuning guidance, and threat hunting hypotheses.

This agent is particularly valuable for purple team work and for security engineers who need to translate offensive findings into defensive controls. The Kerberoasting detection example shows the same technique covered across Sigma, Splunk SPL, and Elastic KQL — ready to drop into your SIEM with tuning notes attached.

5. STIG Analyst

DISA STIG compliance work is notoriously tedious — cross-referencing control IDs against lengthy PDFs, working out exactly which registry keys or GPO settings need changing, and writing justifications for controls you can’t fully implement without breaking something. The STIG analyst handles all of this.

Give it a STIG control ID (such as V-220768) and it provides a full analysis: what the control requires, what breaks if you apply it in a standard environment, the exact GPO path and registry setting needed for remediation, verification commands, and a ready-to-use keep-open justification template for auditors when the control can’t be fully applied. Covers Windows, Linux, Active Directory, network devices, VMware, and application STIGs.

6. Report Generator

Report writing is where many technical practitioners struggle most. The report generator transforms a list of raw findings into a structured professional report following PTES/OWASP/SANS formatting standards, with an executive summary written for non-technical leadership, CVSS v3.1 scoring, CWE mapping, evidence formatting, and a prioritised remediation roadmap.

The SQL injection finding example shows how raw technical notes become a properly formatted finding with consistent severity scoring and remediation guidance. For practitioners doing their first professional report, or for anyone who wants to speed up the documentation phase of an engagement, this is where pentest-ai delivers immediate practical value.

Installation

Installation is about as simple as it gets. You need Claude Code installed and an active Claude Pro or Max subscription. If you’re not already using Claude Code, install it following Anthropic’s documentation.

Step 1: Clone the Repository

git clone https://github.com/0xSteph/pentest-ai.git

Step 2: Install the Agents

For global installation (agents available in any project):

cp pentest-ai/agents/*.md ~/.claude/agents/

For project-specific installation:

mkdir -p .claude/agents/
cp pentest-ai/agents/*.md .claude/agents/

That’s the entire installation process. There’s no server to start, no dependencies to install beyond Claude Code itself, and no configuration files to edit. The agents are markdown files — you can open them in any text editor, read exactly what each agent knows and how it behaves, and modify them if you want to customise the behaviour for your specific workflow.

Step 3: Open Claude Code and Start Working

Open Claude Code in your terminal and describe your task naturally. A few examples to get started:

# Engagement planning
"I need to plan an internal penetration test for a mid-size company
with Active Directory, 3 VLANs, and about 500 endpoints.
The engagement window is 2 weeks."

# Recon analysis — paste your scan output directly
"Analyze this Nmap output and tell me what to hit first:
[paste nmap -sV -sC output here]"

# AD attack research
"Walk me through AS-REP Roasting — how to execute it,
what tools to use, and how defenders detect it."

# Detection rule creation
"Create a Sigma detection rule for DCSync attacks,
with Splunk SPL and Elastic KQL versions."

# Report writing
"Compile these 8 findings into a professional pentest report
with an executive summary and remediation roadmap."

Claude reads your description, matches it against the agent definitions, and delegates automatically. You can also explicitly invoke a specific agent if you want direct control rather than automatic routing.

How the Routing Works

Each agent file contains a YAML frontmatter block with a description field. Claude Code reads these descriptions and uses them to decide which agent to activate based on what you’re asking. The recon advisor’s description tells Claude to activate when you paste scan output from Nmap, Nessus, or similar tools; the engagement planner activates when you describe a target environment and ask for a plan.

You can inspect exactly how this works by looking at any of the agent files after installation — they’re plain markdown with a short YAML header. The transparency is a feature: you know precisely what each agent is designed to handle, and you can modify the descriptions or capabilities if your use case requires something different. The project’s customisation guide walks through how to create your own agents or extend the existing ones.

Chaining Agents Through an Engagement

The real power of pentest-ai becomes clear when you use the agents in sequence through a complete engagement. A typical workflow might look like this:

  1. Start with the engagement planner to build a phased methodology, establish your MITRE ATT&CK technique targets, and produce your rules of engagement documentation.
  2. Run your recon tooling (nmap, amass, bloodhound, etc.) and feed the output to the recon advisor to get a prioritised list of attack vectors with specific follow-up commands.
  3. Use the exploit guide to work through each attack vector — understanding the methodology, the exact tooling and commands, OPSEC considerations, and what detection looks like from the defender’s side.
  4. Feed your findings to the detection engineer if you’re doing purple team work, producing deployment-ready rules for each technique you’ve successfully executed.
  5. Run STIG analysis on any compliance-relevant findings if the engagement includes compliance scope.
  6. Compile the final report with the report generator, transforming your raw notes into a professional deliverable.

Each agent maintains awareness of the domain context you’ve established, so the chain flows naturally rather than requiring you to re-explain the engagement scope at every step.

Who Is pentest-ai For?

The project’s README makes an interesting claim: “You don’t need to be an expert to use these agents. They communicate at whatever level you need — from explaining what Kerberoasting is to providing exact Impacket command syntax for a senior operator.” That range is genuinely useful.

For beginners and students working towards OSCP or similar certifications, pentest-ai provides a structured framework for thinking about engagements that would otherwise take years of mentorship to absorb. The engagement planner alone — with its MITRE ATT&CK mappings and phased methodology — gives you the mental model that experienced practitioners have internalised.

For experienced practitioners, it compresses the documentation and analysis phases of an engagement. Engagement planning, report writing, and detection rule generation are the phases that eat time without directly advancing the technical work — pentest-ai handles them faster and to a consistent standard.

For security engineers and blue teamers, the detection engineer and exploit guide combination is particularly valuable for purple team exercises. Understanding the attacker’s methodology in detail — including exactly what telemetry it generates — makes your detection engineering substantially better.

The prerequisites are modest: Claude Code with a Pro or Max subscription, and for actual security testing, the professional requirement of signed rules of engagement and defined scope. The project recommends holding (or working towards) OSCP, GPEN, PenTest+, CEH, or CPTS, though the agents are useful well before you reach that level.

A Note on What pentest-ai Doesn’t Do

It’s worth being clear about the boundaries. pentest-ai doesn’t execute tools, send packets, or touch any system. It’s a knowledge and methodology layer built on top of Claude Code, not an autonomous attack framework. If you ask the exploit guide how to execute a DCSync attack, it tells you the methodology, the tools (Mimikatz, Impacket’s secretsdump.py), the exact commands, and the detection signatures — but it doesn’t run those commands for you.

This is a deliberate design decision, and it’s the right one. The methodology and analysis layer is where AI assistance genuinely adds value for most practitioners; the execution layer is where human oversight, authorisation verification, and scope awareness need to remain firmly in control.

Final Thoughts

pentest-ai is a lean, well-designed project that does exactly what it says. The installation takes about thirty seconds. The agents are immediately useful. The dual offensive/defensive perspective baked into the exploit guide is a thoughtful design choice that makes the tool more valuable for a broader range of security roles. And the fact that it’s built on Claude Code rather than a proprietary framework means it benefits from Anthropic’s ongoing model improvements without requiring any changes to the agents themselves.

It’s a newer project — only a handful of stars at the time of writing — but the quality of the agent design suggests it will find its audience quickly among practitioners who are already using Claude Code in their daily workflow. If that’s you, it’s well worth adding to your Claude agents directory today.

Find the project at github.com/0xSteph/pentest-ai, with full documentation, example outputs for each agent, and a customisation guide for adapting the agents to your specific workflow.


Leave a Reply