yara-rule-authoring

0
0
Source

Guides authoring of high-quality YARA-X detection rules for malware identification. Use when writing, reviewing, or optimizing YARA rules. Covers naming conventions, string selection, performance optimization, migration from legacy YARA, and false positive reduction. Triggers on: YARA, YARA-X, malware detection, threat hunting, IOC, signature, crx module, dex module.

Install

mkdir -p .claude/skills/yara-rule-authoring && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4616" && unzip -o skill.zip -d .claude/skills/yara-rule-authoring && rm skill.zip

Installs to .claude/skills/yara-rule-authoring

About this skill

YARA-X Rule Authoring

Write detection rules that catch malware without drowning in false positives.

This skill targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal's production systems and is the recommended implementation. See Migrating from Legacy YARA if you have existing rules.

Core Principles

  1. Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode verification on too many files.

  2. Target specific families, not categories — "Detects ransomware" catches everything and nothing. "Detects LockBit 3.0 configuration extraction routine" catches what you want.

  3. Test against goodware before deployment — A rule that fires on Windows system files is useless. Validate against VirusTotal's goodware corpus or your own clean file set.

  4. Short-circuit with cheap checks first — Put filesize < 10MB and uint16(0) == 0x5A4D before expensive string searches or module calls.

  5. Metadata is documentation — Future you (and your team) need to know what this catches, why, and where the sample came from.

When to Use

  • Writing new YARA-X rules for malware detection
  • Reviewing existing rules for quality or performance issues
  • Optimizing slow-running rulesets
  • Converting IOCs or threat intel into detection signatures
  • Debugging false positive issues
  • Preparing rules for production deployment
  • Migrating legacy YARA rules to YARA-X
  • Analyzing Chrome extensions (crx module)
  • Analyzing Android apps (dex module)

When NOT to Use

  • Static analysis requiring disassembly → use Ghidra/IDA skills
  • Dynamic malware analysis → use sandbox analysis skills
  • Network-based detection → use Suricata/Snort skills
  • Memory forensics with Volatility → use memory forensics skills
  • Simple hash-based detection → just use hash lists

YARA-X Overview

YARA-X is the Rust-based successor to legacy YARA: 5-10x faster regex, better errors, built-in formatter, stricter validation, new modules (crx, dex), 99% rule compatibility.

Install: brew install yara-x (macOS) or cargo install yara-x

Essential commands: yr scan, yr check, yr fmt, yr dump

Platform Considerations

YARA works on any file type. Adapt patterns to your target:

PlatformMagic BytesBad StringsGood Strings
Windows PEuint16(0) == 0x5A4DAPI names, Windows pathsMutex names, PDB paths
macOS Mach-Ouint32(0) == 0xFEEDFACE (32-bit), 0xFEEDFACF (64-bit), 0xCAFEBABE (universal)Common Obj-C methodsKeylogger strings, persistence paths
JavaScript/Node(none needed)require, fetch, axiosObfuscator signatures, eval+decode chains
npm/pip packages(none needed)postinstall, dependenciesSuspicious package names, exfil URLs
Office docsuint32(0) == 0x504B0304VBA keywordsMacro auto-exec, encoded payloads
VS Code extensions(none needed)vscode.workspaceUncommon activationEvents, hidden file access
Chrome extensionsUse crx moduleCommon Chrome APIsPermission abuse, manifest anomalies
Android appsUse dex moduleStandard DEX structureObfuscated classes, suspicious permissions

macOS Malware Detection

No dedicated Mach-O module exists yet. Use magic byte checks + string patterns:

Magic bytes:

// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA

Good indicators for macOS malware:

  • Keylogger artifacts: CGEventTapCreate, kCGEventKeyDown
  • SSH tunnel strings: ssh -D, tunnel, socks
  • Persistence paths: ~/Library/LaunchAgents, /Library/LaunchDaemons
  • Credential theft: security find-generic-password, keychain

Example pattern from Airbnb BinaryAlert:

rule SUSP_Mac_ProtonRAT
{
    strings:
        // Library indicators
        $lib1 = "SRWebSocket" ascii
        $lib2 = "SocketRocket" ascii

        // Behavioral indicators
        $behav1 = "SSH tunnel not launched" ascii
        $behav2 = "Keylogger" ascii

    condition:
        (uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
        any of ($lib*) and any of ($behav*)
}

JavaScript Detection Decision Tree

Writing a JavaScript rule?
├─ npm package?
│  ├─ Check package.json patterns
│  ├─ Look for postinstall/preinstall hooks
│  └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│  ├─ Chrome: Use crx module
│  └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│  ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│  ├─ Target unique function/variable names (often survive minification)
│  └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
   ├─ Target unique strings that survive bundling (URLs, magic values)
   └─ Avoid function names (will be mangled)

JavaScript-specific good strings:

  • Ethereum function selectors: { 70 a0 82 31 } (transfer)
  • Zero-width characters (steganography): { E2 80 8B E2 80 8C }
  • Obfuscator signatures: _0x, var _0x
  • Specific C2 patterns: domain names, webhook URLs

JavaScript-specific bad strings:

  • require, fetch, axios — too common
  • Buffer, crypto — legitimate uses everywhere
  • process.env alone — need specific env var names

Essential Toolkit

ToolPurpose
yarGenExtract candidate strings: yarGen.py -m samples/ --excludegood → validate with yr check
FLOSSExtract obfuscated/stack strings: floss sample.exe (when yarGen fails)
yr CLIValidate: yr check, scan: yr scan -s, inspect: yr dump -m pe
signature-baseStudy quality examples
YARA-CIGoodware corpus testing before deployment

Master these five. Don't get distracted by tool catalogs.

Rationalizations to Reject

When you catch yourself thinking these, stop and reconsider.

RationalizationExpert Response
"This generic string is unique enough"Test against goodware first. Your intuition is wrong.
"yarGen gave me these strings"yarGen suggests, you validate. Check each one manually.
"It works on my 10 samples"10 samples ≠ production. Use VirusTotal goodware corpus.
"One rule to catch all variants"Causes FP floods. Target specific families.
"I'll make it more specific if we get FPs"Write tight rules upfront. FPs burn trust.
"This hex pattern is unique"Unique in one sample ≠ unique across malware ecosystem.
"Performance doesn't matter"One slow rule slows entire ruleset. Optimize atoms.
"PEiD rules still work"Obsolete. 32-bit packers aren't relevant.
"I'll add more conditions later"Weak rules deployed = damage done.
"This is just for hunting"Hunting rules become detection rules. Same quality bar.
"The API name makes it malicious"Legitimate software uses same APIs. Need behavioral context.
"any of them is fine for these common strings"Common strings + any = FP flood. Use any of only for individually unique strings.
"This regex is specific enough"/fetch.*token/ matches all auth code. Add exfil destination requirement.
"The JavaScript looks clean"Attackers poison legitimate code with injects. Check for eval+decode chains.
"I'll use .* for flexibility"Unbounded regex = performance disaster + memory explosion. Use .{0,30}.
"I'll use --relaxed-re-syntax everywhere"Masks real bugs. Fix the regex instead of hiding problems.

Decision Trees

Is This String Good Enough?

Is this string good enough?
├─ Less than 4 bytes?
│  └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│  └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│  └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│  └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│  └─ NO — find malware-specific paths
├─ Unique to this malware family?
│  └─ YES — use it
└─ Appears in other malware too?
   └─ MAYBE — combine with family-specific marker

When to Use "all of" vs "any of"

Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│  └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│  └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│  └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
   └─ Tighten: switch any → all, add more required strings

Lesson from production: Rules using any of ($network_*) where strings included "fetch", "axios", "http" matched virtually all web applications. Switching to require credential path AND network call AND exfil destination eliminated FPs.

When to Abandon a Rule Approach

Stop and pivot when:

  • yarGen returns only API names and paths → See When Strings Fail, Pivot to Structure

  • Can't find 3 unique strings → Probably packed. Target the unpacked version or detect the packer.

  • Rule matches goodware files → Strings aren't unique enough. 1-2 matches = investigate and tighten; 3-5 matches = find different indicators; 6+ matches = start over.

  • Performance is terrible even after optimization → Architecture problem. Split into multiple focused rules or add strict pre-filters.

  • Description is hard to write → The rule is too vague. If you can't explain what it catches, it catches too much.

Debugging False Positives

FP Investigation Flow:
│
├─ 1. Which string matched?
│     Run: yr scan -s rule

---

*Content truncated.*

differential-review

trailofbits

Performs security-focused differential review of code changes (PRs, commits, diffs). Adapts analysis depth to codebase size, uses git history for context, calculates blast radius, checks test coverage, and generates comprehensive markdown reports. Automatically detects and prevents security regressions.

24

semgrep

trailofbits

Semgrep is a fast static analysis tool for finding bugs and enforcing code standards. Use when scanning code for security issues or integrating into CI/CD pipelines.

333

fuzzing-dictionary

trailofbits

Fuzzing dictionaries guide fuzzers with domain-specific tokens. Use when fuzzing parsers, protocols, or format-specific code.

52

claude-in-chrome-troubleshooting

trailofbits

Diagnose and fix Claude in Chrome MCP extension connectivity issues. Use when mcp__claude-in-chrome__* tools fail, return "Browser extension is not connected", or behave erratically.

11

property-based-testing

trailofbits

Provides guidance for property-based testing across multiple languages and smart contracts. Use when writing tests, reviewing code with serialization/validation/parsing patterns, designing features, or when property-based testing would provide stronger coverage than example-based tests.

00

sarif-parsing

trailofbits

Parse, analyze, and process SARIF (Static Analysis Results Interchange Format) files. Use when reading security scan results, aggregating findings from multiple tools, deduplicating alerts, extracting specific vulnerabilities, or integrating SARIF data into CI/CD pipelines.

00

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

643969

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

591705

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

318399

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

340397

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

452339

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

304231

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.