The `test-driven-development` Claude Code Skill: Make Claude Write Tests First

What Is the `test-driven-development` Skill?

The test-driven-development skill loads a set of instructions that change how Claude approaches any implementation task. Instead of writing the implementation and then asking “should I add tests?”, Claude:

  1. Writes a failing test first (red)
  2. Writes the minimum implementation to make the test pass (green)
  3. Refactors while keeping tests green (refactor)

This is the classic TDD loop, applied to AI-assisted development. The skill gives Claude the reasoning patterns to maintain this discipline across a full session, including when you are tempted to skip ahead.

Why AI Code Generation Needs TDD More Than Human Code Does

When a human skips tests, they at least have the cognitive context of what they just wrote. When Claude skips tests, you have code that was generated at inference speed — potentially hundreds of lines — with no executable verification of its behaviour. The gap between “Claude wrote this” and “this is correct” is exactly where tests live.

There is also a scope control benefit. TDD forces each implementation increment to be small enough to test in isolation. This naturally constrains Claude’s output to focused, verifiable chunks rather than large monolithic generations that are hard to review.

Installation

claude skill add test-driven-development

Or via the Skills panel: Customize > Skills > search “test-driven-development” > toggle on. The skill is framework-agnostic. It works with Jest, Vitest, pytest, RSpec, Go’s testing package, and any other test framework. Tell Claude which framework your project uses and it will generate tests in the correct format.

What the Skill Actually Does

Test-first enforcement — When you give Claude an implementation task, it produces the test file before the implementation file. If you ask it to skip the tests, the skill’s instructions push back and explain why.

Failing test verification — Claude generates tests that are designed to fail before the implementation exists. A test that passes before the code is written is not testing anything.

Minimal green implementations — After the failing test, Claude writes only enough code to make the test pass. It does not gold-plate or anticipate requirements not captured in the test.

Test quality checks — The skill gives Claude heuristics for test quality: tests should be isolated, deterministic, and test behaviour rather than implementation details.

Regression coverage — When fixing bugs, Claude writes a failing test that reproduces the bug before patching it. This ensures the bug is covered by the test suite and cannot silently regress.

Refactoring with confidence — After green, Claude can suggest refactors and will verify that the test suite still passes after each change.

Framework-Specific Behaviour

JavaScript/TypeScript (Jest, Vitest):

// Test first:
describe('calculateDiscount', () => {
  it('applies 10% for orders over $100', () => {
    expect(calculateDiscount(150)).toBe(135);
  });

  it('applies no discount for orders under $100', () => {
    expect(calculateDiscount(80)).toBe(80);
  });
});

// Implementation after:
export function calculateDiscount(amount: number): number {
  return amount > 100 ? amount * 0.9 : amount;
}

Python (pytest):

# Test first:
def test_calculate_discount_over_100():
    assert calculate_discount(150) == 135.0

def test_calculate_discount_under_100():
    assert calculate_discount(80) == 80

# Implementation after:
def calculate_discount(amount: float) -> float:
    return amount * 0.9 if amount > 100 else amount

The skill produces this structure automatically, in the right framework, for every implementation task.

Use Cases

New feature development — The standard use case. Describe the feature’s behaviour and Claude writes tests that capture that behaviour, then the implementation to satisfy them.

Bug fixing — Report a bug and Claude writes a test that reproduces it before touching the implementation. The test becomes permanent regression coverage.

API contract definition — Before implementing a new API endpoint, write tests that define the expected request/response shapes. This is a form of design-by-contract that catches integration issues early.

Refactoring legacy code — Add tests to legacy code before refactoring it. Claude can write characterisation tests as a safety net.

Specification by example — Write your requirements as test cases. This forces ambiguous requirements to become concrete and executable before implementation begins.

Combining with `subagent-driven-development`

The natural pairing is test-driven-development + subagent-driven-development. One subagent writes the tests; a separate subagent writes the implementation to satisfy them. This gives you:

  • Tests written without bias toward a particular implementation
  • Implementation constrained by tests written independently
  • A natural review gate: does the implementation make the tests pass?

This is the closest AI-assisted development gets to having two developers working on the same feature independently.

Tips and Gotchas

Specify your test framework explicitly. At the start of a session, tell Claude “we are using Vitest with React Testing Library” or “we are using pytest with fixtures in conftest.py.”

Test behaviour, not implementation. If Claude starts writing tests that check internal implementation details, push back. Tests should describe what the code does from the outside, not how it does it.

100% coverage is not the goal. The skill does not push for coverage metrics. It pushes for tests of meaningful behaviour.

Run tests in CI. The skill makes Claude write tests; your CI pipeline makes them matter. Without automated test runs on every commit, TDD is a productivity tool. With CI, it is a safety net.

Further Reading

The test-driven-development skill does not make Claude a better programmer in an abstract sense. It makes Claude’s output verifiable. That distinction matters enormously when you are reviewing code generated at inference speed that you did not write yourself. Tests are not overhead — they are the mechanism by which AI-generated code becomes trustworthy code.


Leave a Reply