zarr-python

Name: zarr-python
Author: davila7

0views

1installs

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

Install

mkdir -p .claude/skills/zarr-python && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4471" && unzip -o skill.zip -d .claude/skills/zarr-python && rm skill.zip

Installs to .claude/skills/zarr-python

About this skill

Zarr Python

Overview

Zarr is a Python library for storing large N-dimensional arrays with chunking and compression. Apply this skill for efficient parallel I/O, cloud-native workflows, and seamless integration with NumPy, Dask, and Xarray.

Quick Start

Installation

uv pip install zarr

Requires Python 3.11+. For cloud storage support, install additional packages:

uv pip install s3fs  # For S3
uv pip install gcsfs  # For Google Cloud Storage

Basic Array Creation

import zarr
import numpy as np

# Create a 2D array with chunking and compression
z = zarr.create_array(
    store="data/my_array.zarr",
    shape=(10000, 10000),
    chunks=(1000, 1000),
    dtype="f4"
)

# Write data using NumPy-style indexing
z[:, :] = np.random.random((10000, 10000))

# Read data
data = z[0:100, 0:100]  # Returns NumPy array

Core Operations

Creating Arrays

Zarr provides multiple convenience functions for array creation:

# Create empty array
z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000), dtype='f4',
               store='data.zarr')

# Create filled arrays
z = zarr.ones((5000, 5000), chunks=(500, 500))
z = zarr.full((1000, 1000), fill_value=42, chunks=(100, 100))

# Create from existing data
data = np.arange(10000).reshape(100, 100)
z = zarr.array(data, chunks=(10, 10), store='data.zarr')

# Create like another array
z2 = zarr.zeros_like(z)  # Matches shape, chunks, dtype of z

Opening Existing Arrays

# Open array (read/write mode by default)
z = zarr.open_array('data.zarr', mode='r+')

# Read-only mode
z = zarr.open_array('data.zarr', mode='r')

# The open() function auto-detects arrays vs groups
z = zarr.open('data.zarr')  # Returns Array or Group

Reading and Writing Data

Zarr arrays support NumPy-like indexing:

# Write entire array
z[:] = 42

# Write slices
z[0, :] = np.arange(100)
z[10:20, 50:60] = np.random.random((10, 10))

# Read data (returns NumPy array)
data = z[0:100, 0:100]
row = z[5, :]

# Advanced indexing
z.vindex[[0, 5, 10], [2, 8, 15]]  # Coordinate indexing
z.oindex[0:10, [5, 10, 15]]       # Orthogonal indexing
z.blocks[0, 0]                     # Block/chunk indexing

Resizing and Appending

# Resize array
z.resize(15000, 15000)  # Expands or shrinks dimensions

# Append data along an axis
z.append(np.random.random((1000, 10000)), axis=0)  # Adds rows

Chunking Strategies

Chunking is critical for performance. Choose chunk sizes and shapes based on access patterns.

Chunk Size Guidelines

Minimum chunk size: 1 MB recommended for optimal performance
Balance: Larger chunks = fewer metadata operations; smaller chunks = better parallel access
Memory consideration: Entire chunks must fit in memory during compression

# Configure chunk size (aim for ~1MB per chunk)
# For float32 data: 1MB = 262,144 elements = 512×512 array
z = zarr.zeros(
    shape=(10000, 10000),
    chunks=(512, 512),  # ~1MB chunks
    dtype='f4'
)

Aligning Chunks with Access Patterns

Critical: Chunk shape dramatically affects performance based on how data is accessed.

# If accessing rows frequently (first dimension)
z = zarr.zeros((10000, 10000), chunks=(10, 10000))  # Chunk spans columns

# If accessing columns frequently (second dimension)
z = zarr.zeros((10000, 10000), chunks=(10000, 10))  # Chunk spans rows

# For mixed access patterns (balanced approach)
z = zarr.zeros((10000, 10000), chunks=(1000, 1000))  # Square chunks

Performance example: For a (200, 200, 200) array, reading along the first dimension:

Using chunks (1, 200, 200): ~107ms
Using chunks (200, 200, 1): ~1.65ms (65× faster!)

Sharding for Large-Scale Storage

When arrays have millions of small chunks, use sharding to group chunks into larger storage objects:

from zarr.codecs import ShardingCodec, BytesCodec
from zarr.codecs.blosc import BloscCodec

# Create array with sharding
z = zarr.create_array(
    store='data.zarr',
    shape=(100000, 100000),
    chunks=(100, 100),  # Small chunks for access
    shards=(1000, 1000),  # Groups 100 chunks per shard
    dtype='f4'
)

Benefits:

Reduces file system overhead from millions of small files
Improves cloud storage performance (fewer object requests)
Prevents filesystem block size waste

Important: Entire shards must fit in memory before writing.

Compression

Zarr applies compression per chunk to reduce storage while maintaining fast access.

Configuring Compression

from zarr.codecs.blosc import BloscCodec
from zarr.codecs import GzipCodec, ZstdCodec

# Default: Blosc with Zstandard
z = zarr.zeros((1000, 1000), chunks=(100, 100))  # Uses default compression

# Configure Blosc codec
z = zarr.create_array(
    store='data.zarr',
    shape=(1000, 1000),
    chunks=(100, 100),
    dtype='f4',
    codecs=[BloscCodec(cname='zstd', clevel=5, shuffle='shuffle')]
)

# Available Blosc compressors: 'blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd'

# Use Gzip compression
z = zarr.create_array(
    store='data.zarr',
    shape=(1000, 1000),
    chunks=(100, 100),
    dtype='f4',
    codecs=[GzipCodec(level=6)]
)

# Disable compression
z = zarr.create_array(
    store='data.zarr',
    shape=(1000, 1000),
    chunks=(100, 100),
    dtype='f4',
    codecs=[BytesCodec()]  # No compression
)

Compression Performance Tips

Blosc (default): Fast compression/decompression, good for interactive workloads
Zstandard: Better compression ratios, slightly slower than LZ4
Gzip: Maximum compression, slower performance
LZ4: Fastest compression, lower ratios
Shuffle: Enable shuffle filter for better compression on numeric data

# Optimal for numeric scientific data
codecs=[BloscCodec(cname='zstd', clevel=5, shuffle='shuffle')]

# Optimal for speed
codecs=[BloscCodec(cname='lz4', clevel=1)]

# Optimal for compression ratio
codecs=[GzipCodec(level=9)]

Storage Backends

Zarr supports multiple storage backends through a flexible storage interface.

Local Filesystem (Default)

from zarr.storage import LocalStore

# Explicit store creation
store = LocalStore('data/my_array.zarr')
z = zarr.open_array(store=store, mode='w', shape=(1000, 1000), chunks=(100, 100))

# Or use string path (creates LocalStore automatically)
z = zarr.open_array('data/my_array.zarr', mode='w', shape=(1000, 1000),
                    chunks=(100, 100))

In-Memory Storage

from zarr.storage import MemoryStore

# Create in-memory store
store = MemoryStore()
z = zarr.open_array(store=store, mode='w', shape=(1000, 1000), chunks=(100, 100))

# Data exists only in memory, not persisted

ZIP File Storage

from zarr.storage import ZipStore

# Write to ZIP file
store = ZipStore('data.zip', mode='w')
z = zarr.open_array(store=store, mode='w', shape=(1000, 1000), chunks=(100, 100))
z[:] = np.random.random((1000, 1000))
store.close()  # IMPORTANT: Must close ZipStore

# Read from ZIP file
store = ZipStore('data.zip', mode='r')
z = zarr.open_array(store=store)
data = z[:]
store.close()

Cloud Storage (S3, GCS)

import s3fs
import zarr

# S3 storage
s3 = s3fs.S3FileSystem(anon=False)  # Use credentials
store = s3fs.S3Map(root='my-bucket/path/to/array.zarr', s3=s3)
z = zarr.open_array(store=store, mode='w', shape=(1000, 1000), chunks=(100, 100))
z[:] = data

# Google Cloud Storage
import gcsfs
gcs = gcsfs.GCSFileSystem(project='my-project')
store = gcsfs.GCSMap(root='my-bucket/path/to/array.zarr', gcs=gcs)
z = zarr.open_array(store=store, mode='w', shape=(1000, 1000), chunks=(100, 100))

Cloud Storage Best Practices:

Use consolidated metadata to reduce latency: zarr.consolidate_metadata(store)
Align chunk sizes with cloud object sizing (typically 5-100 MB optimal)
Enable parallel writes using Dask for large-scale data
Consider sharding to reduce number of objects

Groups and Hierarchies

Groups organize multiple arrays hierarchically, similar to directories or HDF5 groups.

Creating and Using Groups

# Create root group
root = zarr.group(store='data/hierarchy.zarr')

# Create sub-groups
temperature = root.create_group('temperature')
precipitation = root.create_group('precipitation')

# Create arrays within groups
temp_array = temperature.create_array(
    name='t2m',
    shape=(365, 720, 1440),
    chunks=(1, 720, 1440),
    dtype='f4'
)

precip_array = precipitation.create_array(
    name='prcp',
    shape=(365, 720, 1440),
    chunks=(1, 720, 1440),
    dtype='f4'
)

# Access using paths
array = root['temperature/t2m']

# Visualize hierarchy
print(root.tree())
# Output:
# /
#  ├── temperature
#  │   └── t2m (365, 720, 1440) f4
#  └── precipitation
#      └── prcp (365, 720, 1440) f4

H5py-Compatible API

Zarr provides an h5py-compatible interface for familiar HDF5 users:

# Create group with h5py-style methods
root = zarr.group('data.zarr')
dataset = root.create_dataset('my_data', shape=(1000, 1000), chunks=(100, 100),
                              dtype='f4')

# Access like h5py
grp = root.require_group('subgroup')
arr = grp.require_dataset('array', shape=(500, 500), chunks=(50, 50), dtype='i4')

Attributes and Metadata

Attach custom metadata to arrays and groups using attributes:

# Add attributes to array
z = zarr.zeros((1000, 1000), chunks=(100, 100))
z.attrs['description'] = 'Temperature data in Kelvin'
z.attrs['units'] = 'K'
z.attrs['created'] = '2024-01-15'
z.attrs['processing_version'] = 2.1

# Attributes are stored as JSON
print(z.attrs['units'])  # Output: K

# Add attributes to groups
root = zarr.group('data.zarr')
root.attrs['project'] = 'Climate Analysis'
root.attrs['institution'] = 'Research Institute'

# Attributes persist with the array/group
z2 = zarr.open('data.zarr')
print(z2.attrs['description'])

Important: Attri

Content truncated.

More by davila7

View all skills by davila7 →

software-architecture

davila7

Guide for quality focused software architecture. This skill should be used when users want to write code, design architecture, analyze code, in any case that relates to software development.

1,259482

planning-with-files

davila7

Implements Manus-style file-based planning for complex tasks. Creates task_plan.md, findings.md, and progress.md. Use when starting complex multi-step tasks, research projects, or any task requiring >5 tool calls.

130314

scientific-brainstorming

davila7

Research ideation partner. Generate hypotheses, explore interdisciplinary connections, challenge assumptions, develop methodologies, identify research gaps, for creative scientific problem-solving.

332179

telegram-bot-builder

davila7

Expert in building Telegram bots that solve real problems - from simple automation to complex AI-powered bots. Covers bot architecture, the Telegram Bot API, user experience, monetization strategies, and scaling bots to thousands of users. Use when: telegram bot, bot api, telegram automation, chat bot telegram, tg bot.

160176

scroll-experience

davila7

Expert in building immersive scroll-driven experiences - parallax storytelling, scroll animations, interactive narratives, and cinematic web experiences. Like NY Times interactives, Apple product pages, and award-winning web experiences. Makes websites feel like experiences, not just pages. Use when: scroll animation, parallax, scroll storytelling, interactive story, cinematic website.

148106

humanizer

davila7

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases. Credits: Original skill by @blader - https://github.com/blader/humanizer

222105

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8892,530

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,8201,662

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1561,645

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2691,469

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4731,225

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,961969

Related MCP Servers

Browse all servers

Azure All

Supercharge AI platforms with Azure MCP Server for seamless Azure API Management and resource automation. Public Preview now available!

1,20847 tools

Tencent CloudBase

Deploy, monitor, and manage cloud based DBMS and cloud database management tasks on Tencent CloudBase with AI-powered tools and hosting services.

9700 tools

Tencent CloudBase

Deploy, monitor, and manage full-stack apps on Tencent CloudBase—tools for cloud environments, databases, functions, hosting, and storage.

97038 tools

Google Cloud

Effortlessly manage Google Cloud with this user-friendly multi cloud management platform—simplify operations, automate tasks, and boost confidence.

7010 tools

Tencent EdgeOne Pages

Effortlessly deploy static sites with EdgeOne Pages—an easy, scalable alternative to Amazon website hosting for fast, reliable delivery.

3961 tools

Firebase

Use Firebase to integrate Firebase Authentication, Firestore, and Storage for seamless backend services in your apps.

2400 tools

Install

mkdir -p .claude/skills/zarr-python && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4471" && unzip -o skill.zip -d .claude/skills/zarr-python && rm skill.zip

Installs to .claude/skills/zarr-python

Stats

Views

Installs

Author

davila7

7 skills published

Links

Source Code

zarr-python

Install

About this skill

Zarr Python

Overview

Quick Start

Installation

Basic Array Creation

Core Operations

Creating Arrays

Opening Existing Arrays

Reading and Writing Data

Resizing and Appending

Chunking Strategies

Chunk Size Guidelines

Aligning Chunks with Access Patterns

Sharding for Large-Scale Storage

Compression

Configuring Compression

Compression Performance Tips

Storage Backends

Local Filesystem (Default)

In-Memory Storage

ZIP File Storage

Cloud Storage (S3, GCS)

Groups and Hierarchies

Creating and Using Groups

H5py-Compatible API

Attributes and Metadata

More by davila7

software-architecture

planning-with-files

scientific-brainstorming

telegram-bot-builder

scroll-experience

humanizer

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers