cocoindex
Comprehensive toolkit for developing with the CocoIndex library. Use when users need to create data transformation pipelines (flows), write custom functions, or operate flows via CLI or API. Covers building ETL workflows for AI data processing, including embedding documents into vector databases, building knowledge graphs, creating search indexes, or processing data streams with incremental updates.
Install
mkdir -p .claude/skills/cocoindex && curl -L -o skill.zip "https://mcp.directory/api/skills/download/234" && unzip -o skill.zip -d .claude/skills/cocoindex && rm skill.zipInstalls to .claude/skills/cocoindex
About this skill
CocoIndex
Overview
CocoIndex is an ultra-performant real-time data transformation framework for AI with incremental processing. This skill enables building indexing flows that extract data from sources, apply transformations (chunking, embedding, LLM extraction), and export to targets (vector databases, graph databases, relational databases).
Core capabilities:
- Write indexing flows - Define ETL pipelines using Python
- Create custom functions - Build reusable transformation logic
- Operate flows - Run and manage flows using CLI or Python API
Key features:
- Incremental processing (only processes changed data)
- Live updates (continuously sync source changes to targets)
- Built-in functions (text chunking, embeddings, LLM extraction)
- Multiple data sources (local files, S3, Azure Blob, Google Drive, Postgres)
- Multiple targets (Postgres+pgvector, Qdrant, LanceDB, Neo4j, Kuzu)
For detailed documentation: https://cocoindex.io/docs/ Search documentation: https://cocoindex.io/docs/search?q=url%20encoded%20keyword
When to Use This Skill
Use when users request:
- "Build a vector search index for my documents"
- "Create an embedding pipeline for code/PDFs/images"
- "Extract structured information using LLMs"
- "Build a knowledge graph from documents"
- "Set up live document indexing"
- "Create custom transformation functions"
- "Run/update my CocoIndex flow"
Flow Writing Workflow
Step 1: Understand Requirements
Ask clarifying questions to understand:
Data source:
- Where is the data? (local files, S3, database, etc.)
- What file types? (text, PDF, JSON, images, code, etc.)
- How often does it change? (one-time, periodic, continuous)
Transformations:
- What processing is needed? (chunking, embedding, extraction, etc.)
- Which embedding model? (SentenceTransformer, OpenAI, custom)
- Any custom logic? (filtering, parsing, enrichment)
Target:
- Where should results go? (Postgres, Qdrant, Neo4j, etc.)
- What schema? (fields, primary keys, indexes)
- Vector search needed? (specify similarity metric)
Step 2: Set Up Dependencies
Guide user to add CocoIndex with appropriate extras to their project based on their needs:
Required dependency:
cocoindex- Core functionality, CLI, and most built-in functions
Optional extras (add as needed):
cocoindex[embeddings]- For SentenceTransformer embeddings (when usingSentenceTransformerEmbed)cocoindex[colpali]- For ColPali image/document embeddings (when usingColPaliEmbedImageorColPaliEmbedQuery)cocoindex[lancedb]- For LanceDB target (when exporting to LanceDB)cocoindex[embeddings,lancedb]- Multiple extras can be combined
What's included:
- Base package: Core functionality, CLI, most built-in functions, Postgres/Qdrant/Neo4j/Kuzu targets
embeddingsextra: SentenceTransformers library for local embedding modelscolpaliextra: ColPali engine for multimodal document/image embeddingslancedbextra: LanceDB client library for LanceDB vector database support
Users can install using their preferred package manager (pip, uv, poetry, etc.) or add to pyproject.toml.
For installation details: https://cocoindex.io/docs/getting_started/installation
Step 3: Set Up Environment
Check existing environment first:
-
Check if
COCOINDEX_DATABASE_URLexists in environment variables- If not found, use default:
postgres://cocoindex:cocoindex@localhost/cocoindex
- If not found, use default:
-
For flows requiring LLM APIs (embeddings, extraction):
- Ask user which LLM provider they want to use:
- OpenAI - Both generation and embeddings
- Anthropic - Generation only
- Gemini - Both generation and embeddings
- Voyage - Embeddings only
- Ollama - Local models (generation and embeddings)
- Check if the corresponding API key exists in environment variables
- If not found, ask user to provide the API key value
- Never create simplified examples without LLM - always get the proper API key and use the real LLM functions
- Ask user which LLM provider they want to use:
Guide user to create .env file:
# Database connection (required - internal storage)
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
# LLM API keys (add the ones you need)
OPENAI_API_KEY=sk-... # For OpenAI (generation + embeddings)
ANTHROPIC_API_KEY=sk-ant-... # For Anthropic (generation only)
GOOGLE_API_KEY=... # For Gemini (generation + embeddings)
VOYAGE_API_KEY=pa-... # For Voyage (embeddings only)
# Ollama requires no API key (local)
For more LLM options: https://cocoindex.io/docs/ai/llm
Create basic project structure:
# main.py
from dotenv import load_dotenv
import cocoindex
@cocoindex.flow_def(name="FlowName")
def my_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
# Flow definition here
pass
if __name__ == "__main__":
load_dotenv()
cocoindex.init()
my_flow.update()
Step 4: Write the Flow
Follow this structure:
@cocoindex.flow_def(name="DescriptiveName")
def flow_name(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
# 1. Import source data
data_scope["source_name"] = flow_builder.add_source(
cocoindex.sources.SourceType(...)
)
# 2. Create collector(s) for outputs
collector = data_scope.add_collector()
# 3. Transform data (iterate through rows)
with data_scope["source_name"].row() as item:
# Apply transformations
item["new_field"] = item["existing_field"].transform(
cocoindex.functions.FunctionName(...)
)
...
# Nested iteration (e.g., chunks within documents)
with item["nested_table"].row() as nested_item:
# More transformations
nested_item["embedding"] = nested_item["text"].transform(...)
# Collect data for export
collector.collect(
field1=nested_item["field1"],
field2=item["field2"],
generated_id=cocoindex.GeneratedField.UUID
)
# 4. Export to target
collector.export(
"target_name",
cocoindex.targets.TargetType(...),
primary_key_fields=["field1"],
vector_indexes=[...] # If needed
)
Key principles:
- Each source creates a field in the top-level data scope
- Use
.row()to iterate through table data - CRITICAL: Always assign transformed data to row fields - Use
item["new_field"] = item["existing_field"].transform(...), NOT local variables likenew_field = item["existing_field"].transform(...) - Transformations create new fields without mutating existing data
- Collectors gather data from any scope level
- Export must happen at top level (not within row iterations)
Common mistakes to avoid:
❌ Wrong: Using local variables for transformations
with data_scope["files"].row() as file:
summary = file["content"].transform(...) # ❌ Local variable
summaries_collector.collect(filename=file["filename"], summary=summary)
✅ Correct: Assigning to row fields
with data_scope["files"].row() as file:
file["summary"] = file["content"].transform(...) # ✅ Field assignment
summaries_collector.collect(filename=file["filename"], summary=file["summary"])
❌ Wrong: Creating unnecessary dataclasses to mirror flow fields
from dataclasses import dataclass
@dataclass
class FileSummary: # ❌ Unnecessary - CocoIndex manages fields automatically
filename: str
summary: str
embedding: list[float]
# This dataclass is never used in the flow!
Step 5: Design the Flow Solution
IMPORTANT: The patterns listed below are common starting points, but you cannot exhaustively enumerate all possible scenarios. When user requirements don't match existing patterns:
- Combine elements from multiple patterns - Mix and match sources, transformations, and targets creatively
- Review additional examples - See https://github.com/cocoindex-io/cocoindex?tab=readme-ov-file#-examples-and-demo for diverse real-world use cases (face recognition, multimodal search, product recommendations, patient form extraction, etc.)
- Think from first principles - Use the core APIs (sources, transforms, collectors, exports) and apply common sense to solve novel problems
- Be creative - CocoIndex is flexible; unique combinations of components can solve unique problems
Common starting patterns (use references for detailed examples):
For text embedding: Load references/flow_patterns.md and refer to "Pattern 1: Simple Text Embedding"
For code embedding: Load references/flow_patterns.md and refer to "Pattern 2: Code Embedding with Language Detection"
For LLM extraction + knowledge graph: Load references/flow_patterns.md and refer to "Pattern 3: LLM-based Extraction to Knowledge Graph"
For live updates: Load references/flow_patterns.md and refer to "Pattern 4: Live Updates with Refresh Interval"
For custom functions: Load references/flow_patterns.md and refer to "Pattern 5: Custom Transform Function"
For reusable query logic: Load references/flow_patterns.md and refer to "Pattern 6: Transform Flow for Reusable Logic"
For concurrency control: Load references/flow_patterns.md and refer to "Pattern 7: Concurrency Control"
Example of pattern composition:
If a user asks to "index images from S3, generate captions with a vision API, and store in Qdrant", combine:
- AmazonS3 source (from S3 examples)
- Custom function for vision API calls (from custom functions pattern)
- EmbedText to embed the captions (from embedding patterns)
- Qdrant target (from target examples)
No single pattern covers this exact scenario, but the building blocks are composable.
Step 6: Test and Run
Guide user through testing:
#
---
*Content truncated.*
You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
pdf-to-markdown
aliceisjustplaying
Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.
Related MCP Servers
Browse all serversSerena is a free AI code generator toolkit providing robust code editing and retrieval, turning LLMs into powerful artif
The fullstack MCP framework for developing MCP apps for ChatGPT, Claude, and building MCP servers for AI agents. Connect
Desktop Commander MCP unifies code management with advanced source control, git, and svn support—streamlining developmen
Deep Research MCP — an AI research assistant and LLM research tool for multi-step web search, content analysis, and synt
Empower AI with the Exa MCP Server—an AI research tool for real-time web search, academic data, and smarter, up-to-date
Boost Postgres performance with Postgres MCP Pro—AI-driven index tuning, health checks, and safe, intelligent SQL optimi
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.