real-pytest-no-mocks-real-tests

0
0
Source

Write pytests that test real public interfaces with actual components, no mocking, and precise assertions. MIRA-specific patterns. Use when creating or reviewing tests.

Install

mkdir -p .claude/skills/real-pytest-no-mocks-real-tests && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7546" && unzip -o skill.zip -d .claude/skills/real-pytest-no-mocks-real-tests && rm skill.zip

Installs to .claude/skills/real-pytest-no-mocks-real-tests

About this skill

Real Testing Philosophy

CRITICAL MINDSET SHIFT

Tests that verify implementation are worse than no tests - they provide false confidence while catching nothing.

Your job is not to confirm the code works. Your job is to:

  1. Think critically about the contract - what SHOULD this module do?
  2. Surface design problems - is this module papering over architectural failures?
  3. Write tests that enforce guarantees - not tests that mirror implementation
  4. Prove tests can fail - see them fail first, verify failure modes are correct

Tests that always pass are actively harmful. They waste time and provide false security.

🚨 NEVER SKIP TESTS

ABSOLUTE RULE: Do NOT use @pytest.mark.skip, @pytest.mark.skipif, or pytest.skip()

Tests either:

  • PASS - the code works correctly
  • FAIL - the code is broken and needs fixing

There is no third state. Skipped tests are:

  • Technical debt pretending to be documentation
  • Broken code that someone gave up on
  • False confidence in test coverage metrics

If a test can't run:

  • Fix the environment/dependencies so it can run
  • Fix the code so the test passes
  • Delete the test if it's testing something that doesn't exist

NEVER commit a skipped test. Either make it pass or delete it.


PHASE 1: Contract-First Analysis (DO THIS FIRST)

NEVER write tests by reading implementation. That's how you write tests that mirror what code does instead of what it should do.

Protocol: Analyze Contract Without Reading Implementation

Step 1: Read ONLY the module's public interface

# Read THIS (public interface)
class ReminderTool:
    def run(self, operation: str, **kwargs) -> Dict[str, Any]:
        """Execute reminder operations."""
        pass

# DO NOT read implementation details
# DO NOT look at internal methods
# DO NOT read how it's implemented

Step 2: Document the contract

Before writing any test, answer these questions in writing:

MODULE CONTRACT ANALYSIS
========================

1. What is this module's PURPOSE?
   - What problem does it solve?
   - Why does it exist?

2. What GUARANTEES does it provide?
   - What promises does the API make?
   - What invariants must hold?
   - What post-conditions are guaranteed?

3. What should SUCCEED?
   - Valid inputs
   - Happy path scenarios
   - Boundary cases that should work

4. What should FAIL?
   - Invalid inputs
   - Boundary conditions that should error
   - Security violations
   - Resource constraints

5. What are the DEPENDENCIES?
   - What does this module depend on?
   - Are there too many dependencies?
   - Could this be simpler?

6. ARCHITECTURAL CONCERNS:
   - Is this module doing too much?
   - Is it papering over design failures elsewhere?
   - Does the contract make sense or is it convoluted?
   - Should this module even exist?

Step 3: Design test cases from contract

Based on contract analysis (NOT implementation):

  • List positive test cases (what should work)
  • List negative test cases (what should fail)
  • List boundary conditions
  • List security concerns
  • List performance concerns

See "CANONICAL EXAMPLE" section below for complete contract analysis walkthrough.


PHASE 1.5: Contract Verification (VALIDATE YOUR ASSUMPTIONS)

CRITICAL: Do NOT read the implementation file yourself. Use the contract-extractor agent as an abstraction barrier.

Why This Phase Exists

You've formed expectations about the contract from the interface. Now verify those expectations against actual implementation WITHOUT seeing the implementation yourself. The agent reads the code and reports ONLY contract facts (not implementation details).

Protocol: Invoke Agent → Compare → Identify Gaps

Step 1: Invoke the contract-extractor agent

# Use Task tool to invoke the agent
Task(
    subagent_type="contract-extractor",
    description="Extract contract from module",
    prompt="""Extract the contract from: path/to/module.py

Return:
- Public interface (methods, signatures, types)
- Actual return structures (dict keys, types)
- Exception contracts (what raises what, when)
- Edge cases handled
- Dependencies and architectural concerns"""
)

Step 2: Compare your expectations against agent report

Create a comparison:

EXPECTATION vs REALITY
======================

Expected return structure:
{
    "status": str,
    "results": list
}

Actual return structure (from agent):
{
    "status": str,
    "confidence": float,  # I MISSED THIS
    "results": list,
    "result_count": int   # I MISSED THIS
}

Expected exceptions:
- ValueError for empty query

Actual exceptions (from agent):
- ValueError for empty query ✓
- ValueError for negative max_results  # I MISSED THIS

Expected edge cases:
- Empty results returns []

Actual edge cases (from agent):
- Empty results returns status="low_confidence", confidence=0.0, results=[]
  # More nuanced than I expected

Step 3: Identify discrepancies and their implications

For each discrepancy, ask:

  • Is the code wrong (doesn't match intended contract)?
  • Is the contract unclear (missing documentation)?
  • Did I misunderstand the requirements?
  • Is this an undocumented feature (needs test)?

Example Analysis:

DISCREPANCY: Agent reports confidence field in return, I didn't expect it
IMPLICATION: This is part of the contract - add test to verify confidence in [0.0, 1.0]

DISCREPANCY: Agent reports ValueError for negative max_results, I didn't expect it
IMPLICATION: Good edge case handling - add negative test

DISCREPANCY: Agent reports 8 dependencies, I expected 3-4
IMPLICATION: ARCHITECTURAL CONCERN - too many deps, report to human

Step 4: Update test plan based on verified contract

Now you know:

  • What the code actually returns (test these exact structures)
  • What exceptions are actually raised (test these exact cases)
  • What edge cases are actually handled (test these behaviors)
  • What architectural problems exist (report these to human)

Step 5: Design comprehensive test cases

# Based on VERIFIED contract (not assumptions):

# Positive tests
- test_search_returns_exact_structure  # Verify all keys agent reported
- test_search_confidence_in_valid_range  # Agent said 0.0-1.0
- test_search_respects_max_results  # Agent confirmed this guarantee

# Negative tests
- test_search_rejects_empty_query  # Agent confirmed ValueError
- test_search_rejects_negative_max_results  # Agent revealed this

# Edge cases
- test_search_empty_results_structure  # Agent showed exact structure
- test_search_with_no_user_data  # Based on RLS info from agent

# Architectural concerns
- Report to human: "Module has 8 dependencies - possible SRP violation"

See "CANONICAL EXAMPLE" section below for complete agent invocation, comparison, and gap analysis walkthrough.

When to Read Implementation

Only AFTER writing tests based on verified contract. Then you can read implementation for context, debugging, or refactoring - but tests are already protecting the contract.


PHASE 2: Fail-First Verification (PROVE TESTS CAN FAIL)

A test that always passes proves nothing. You must see it fail.

Protocol: Write → Fail → Verify

Step 1: Write test based on contract expectations

Don't look at implementation. Write assertions based on what the contract says SHOULD happen.

def test_search_returns_confidence_score(search_tool, authenticated_user):
    """Contract: search must return confidence score between 0.0 and 1.0"""
    user_id = authenticated_user["user_id"]
    set_current_user_id(user_id)

    # Based on contract, not implementation
    result = search_tool.run(
        operation="search",
        query="Python async patterns",
        max_results=5
    )

    # Contract expectations
    assert "confidence" in result
    assert 0.0 <= result["confidence"] <= 1.0
    assert "results" in result
    assert len(result["results"]) <= 5

Step 2: Run the test - expect failure or question success

pytest tests/test_search_tool.py::test_search_returns_confidence_score -v

If test FAILS:

  • Is this the expected failure? (No data exists yet)
  • Is the failure message clear?
  • Is this exposing a bug in the code?
  • Is this exposing a problem with the contract?

If test PASSES immediately:

  • Is the code actually correct?
  • Are my assertions too weak?
  • Am I testing a trivial case?
  • Did I set up test data somewhere I forgot about?

Step 3: Verify the test can actually catch bugs

Temporarily break the code and verify the test fails:

# In the actual implementation, temporarily break it:
def run(self, operation, **kwargs):
    return {"confidence": 2.5}  # INTENTIONAL BUG: exceeds 1.0

Run test - it should fail. If it doesn't, your assertions are too weak.

Step 4: Remove the intentional bug, test should pass

Now you have confidence the test actually works.


Common Testing Anti-Patterns

When writing tests, surface design problems - don't paper over them.

Anti-PatternWhy It's WrongWhat To Do Instead
MockingTests mocks, not code. Hides integration issues.Use real services (sqlite_test_db, test_db). If hard to test, fix design.
Reading implementation firstTests mirror HOW instead of WHAT. Confirms current behavior, doesn't catch regressions.Analyze contract WITHOUT reading code. Use contract-extractor agent.
Tests that mirror implementationTesting that method calls BM25 then embeddings (HOW) vs testing returns relevant results (WHAT).Test observable contract behavior, not internal paths.
Weak assertionsassert result is not None says nothing.Precise: assert 0.0 <= result["confidence"] <= 1.0
Only happy pathsMissing adversarial cases means bugs slip through.Test failure cases: empty inputs, invalid values, boundary condit

Content truncated.

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

9521,094

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

846846

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

571700

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

548492

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

673466

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

514280

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.