data-engineering-data-pipeline

14views

4installs

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

Install

mkdir -p .claude/skills/data-engineering-data-pipeline && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2246" && unzip -o skill.zip -d .claude/skills/data-engineering-data-pipeline && rm skill.zip

Installs to .claude/skills/data-engineering-data-pipeline

About this skill

Data Pipeline Architecture

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

Use this skill when

Working on data pipeline architecture tasks or workflows
Needing guidance, best practices, or checklists for data pipeline architecture

Do not use this skill when

The task is unrelated to data pipeline architecture
You need a different domain or tool outside this scope

Requirements

$ARGUMENTS

Core Capabilities

Design ETL/ELT, Lambda, Kappa, and Lakehouse architectures
Implement batch and streaming data ingestion
Build workflow orchestration with Airflow/Prefect
Transform data using dbt and Spark
Manage Delta Lake/Iceberg storage with ACID transactions
Implement data quality frameworks (Great Expectations, dbt tests)
Monitor pipelines with CloudWatch/Prometheus/Grafana
Optimize costs through partitioning, lifecycle policies, and compute optimization

Instructions

1. Architecture Design

Assess: sources, volume, latency requirements, targets
Select pattern: ETL (transform before load), ELT (load then transform), Lambda (batch + speed layers), Kappa (stream-only), Lakehouse (unified)
Design flow: sources → ingestion → processing → storage → serving
Add observability touchpoints

2. Ingestion Implementation

Batch

Incremental loading with watermark columns
Retry logic with exponential backoff
Schema validation and dead letter queue for invalid records
Metadata tracking (_extracted_at, _source)

Streaming

Kafka consumers with exactly-once semantics
Manual offset commits within transactions
Windowing for time-based aggregations
Error handling and replay capability

3. Orchestration

Airflow

Task groups for logical organization
XCom for inter-task communication
SLA monitoring and email alerts
Incremental execution with execution_date
Retry with exponential backoff

Prefect

Task caching for idempotency
Parallel execution with .submit()
Artifacts for visibility
Automatic retries with configurable delays

4. Transformation with dbt

Staging layer: incremental materialization, deduplication, late-arriving data handling
Marts layer: dimensional models, aggregations, business logic
Tests: unique, not_null, relationships, accepted_values, custom data quality tests
Sources: freshness checks, loaded_at_field tracking
Incremental strategy: merge or delete+insert

5. Data Quality Framework

Great Expectations

Table-level: row count, column count
Column-level: uniqueness, nullability, type validation, value sets, ranges
Checkpoints for validation execution
Data docs for documentation
Failure notifications

dbt Tests

Schema tests in YAML
Custom data quality tests with dbt-expectations
Test results tracked in metadata

6. Storage Strategy

Delta Lake

ACID transactions with append/overwrite/merge modes
Upsert with predicate-based matching
Time travel for historical queries
Optimize: compact small files, Z-order clustering
Vacuum to remove old files

Apache Iceberg

Partitioning and sort order optimization
MERGE INTO for upserts
Snapshot isolation and time travel
File compaction with binpack strategy
Snapshot expiration for cleanup

7. Monitoring & Cost Optimization

Monitoring

Track: records processed/failed, data size, execution time, success/failure rates
CloudWatch metrics and custom namespaces
SNS alerts for critical/warning/info events
Data freshness checks
Performance trend analysis

Cost Optimization

Partitioning: date/entity-based, avoid over-partitioning (keep >1GB)
File sizes: 512MB-1GB for Parquet
Lifecycle policies: hot (Standard) → warm (IA) → cold (Glacier)
Compute: spot instances for batch, on-demand for streaming, serverless for adhoc
Query optimization: partition pruning, clustering, predicate pushdown

Example: Minimal Batch Pipeline

# Batch ingestion with validation
from batch_ingestion import BatchDataIngester
from storage.delta_lake_manager import DeltaLakeManager
from data_quality.expectations_suite import DataQualityFramework

ingester = BatchDataIngester(config={})

# Extract with incremental loading
df = ingester.extract_from_database(
    connection_string='postgresql://host:5432/db',
    query='SELECT * FROM orders',
    watermark_column='updated_at',
    last_watermark=last_run_timestamp
)

# Validate
schema = {'required_fields': ['id', 'user_id'], 'dtypes': {'id': 'int64'}}
df = ingester.validate_and_clean(df, schema)

# Data quality checks
dq = DataQualityFramework()
result = dq.validate_dataframe(df, suite_name='orders_suite', data_asset_name='orders')

# Write to Delta Lake
delta_mgr = DeltaLakeManager(storage_path='s3://lake')
delta_mgr.create_or_update_table(
    df=df,
    table_name='orders',
    partition_columns=['order_date'],
    mode='append'
)

# Save failed records
ingester.save_dead_letter_queue('s3://lake/dlq/orders')

Output Deliverables

1. Architecture Documentation

Architecture diagram with data flow
Technology stack with justification
Scalability analysis and growth patterns
Failure modes and recovery strategies

2. Implementation Code

Ingestion: batch/streaming with error handling
Transformation: dbt models (staging → marts) or Spark jobs
Orchestration: Airflow/Prefect DAGs with dependencies
Storage: Delta/Iceberg table management
Data quality: Great Expectations suites and dbt tests

3. Configuration Files

Orchestration: DAG definitions, schedules, retry policies
dbt: models, sources, tests, project config
Infrastructure: Docker Compose, K8s manifests, Terraform
Environment: dev/staging/prod configs

4. Monitoring & Observability

Metrics: execution time, records processed, quality scores
Alerts: failures, performance degradation, data freshness
Dashboards: Grafana/CloudWatch for pipeline health
Logging: structured logs with correlation IDs

5. Operations Guide

Deployment procedures and rollback strategy
Troubleshooting guide for common issues
Scaling guide for increased volume
Cost optimization strategies and savings
Disaster recovery and backup procedures

Success Criteria

Pipeline meets defined SLA (latency, throughput)
Data quality checks pass with >99% success rate
Automatic retry and alerting on failures
Comprehensive monitoring shows health and performance
Documentation enables team maintenance
Cost optimization reduces infrastructure costs by 30-50%
Schema evolution without downtime
End-to-end data lineage tracked

More by sickn33

View all skills by sickn33 →

unity-developer

sickn33

Build Unity games with optimized C# scripts, efficient rendering, and proper asset management. Masters Unity 6 LTS, URP/HDRP pipelines, and cross-platform deployment. Handles gameplay systems, UI implementation, and platform optimization. Use PROACTIVELY for Unity performance issues, game mechanics, or cross-platform builds.

272103

mobile-design

sickn33

Mobile-first design and engineering doctrine for iOS and Android apps. Covers touch interaction, performance, platform conventions, offline behavior, and mobile-specific decision-making. Teaches principles and constraints, not fixed layouts. Use for React Native, Flutter, or native mobile apps.

15594

frontend-slides

sickn33

Create stunning, animation-rich HTML presentations from scratch or by converting PowerPoint files. Use when the user wants to build a presentation, convert a PPT/PPTX to web, or create slides for a talk/pitch. Helps non-designers discover their aesthetic through visual exploration rather than abstract choices.

16576

minecraft-bukkit-pro

sickn33

Master Minecraft server plugin development with Bukkit, Spigot, and Paper APIs. Specializes in event-driven architecture, command systems, world manipulation, player management, and performance optimization. Use PROACTIVELY for plugin architecture, gameplay mechanics, server-side features, or cross-version compatibility.

6872

architect-review

sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural decisions.

19868

flutter-expert

sickn33

Master Flutter development with Dart 3, advanced widgets, and multi-platform deployment. Handles state management, animations, testing, and performance optimization for mobile, web, desktop, and embedded platforms. Use PROACTIVELY for Flutter architecture, UI implementation, or cross-platform features.

12168

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6841,428

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2621,324

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5331,147

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,353807

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,263727

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,480684

Related MCP Servers

Browse all servers

Azure DevOps

Boost your productivity by managing Azure DevOps projects, pipelines, and repos in VS Code. Streamline dev workflows wit

1,37377 tools

Google Cloud

Effortlessly manage Google Cloud with this user-friendly multi cloud management platform—simplify operations, automate t

7010 tools

GitHub Chat

GitHub Chat lets you query, analyze, and explore GitHub repositories with AI-powered insights, understanding codebases f

852 tools

Keboola MCP Server

Keboola MCP Server connects AI agents and MCP clients to the Keboola data platform for natural language SQL, Keboola int

820 tools

Buildkite

Integrate with Buildkite CI/CD to access pipelines, builds, job logs, artifacts and user data for monitoring workflows a

480 tools

React Native Development Guide

Get expert React Native software guidance with tools for component analysis, performance, debugging, and migration betwe

3913 tools

Install

mkdir -p .claude/skills/data-engineering-data-pipeline && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2246" && unzip -o skill.zip -d .claude/skills/data-engineering-data-pipeline && rm skill.zip

Installs to .claude/skills/data-engineering-data-pipeline

Stats

Views

Installs

Author

sickn33

7 skills published

Links

Source Code

data-engineering-data-pipeline

Install

About this skill

Data Pipeline Architecture

Use this skill when

Do not use this skill when

Requirements

Core Capabilities

Instructions

1. Architecture Design

2. Ingestion Implementation

3. Orchestration

4. Transformation with dbt

5. Data Quality Framework

6. Storage Strategy

7. Monitoring & Cost Optimization

Example: Minimal Batch Pipeline

Output Deliverables

1. Architecture Documentation

2. Implementation Code

3. Configuration Files

4. Monitoring & Observability

5. Operations Guide

Success Criteria

More by sickn33

unity-developer

mobile-design

frontend-slides

minecraft-bukkit-pro

architect-review

flutter-expert

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers