Multimedia Process MCP Servers
29 MCP servers in the multimedia process category. Click any server for install commands, Claude Code setup, and GitHub source.
ComputerVision-based sorcery of image recognition and editing tools for AI assistants.
Convert HTML to PDF/PNG/WebP/PPTX slide carousels with 11 themes (LinkedIn, Instagram, pitch decks, infographics). Pixel-perfect Puppeteer rendering, dimension-aware reflow for portrait/landscape, token-efficient JSON mode. `npx slideshot-mcp`.
npx slideshot-mcpAI-powered video editing MCP server with 10 tools for timeline editing, 5-layer compositing, semantic operations, and FFmpeg rendering (1920x1080, 30fps H.264+AAC).
Suno AI music generation, lyrics, covers, and vocal extraction via Ace Data Cloud API.
Generate presentation PDFs, narrated videos, and MP3 audio from Markdown. Free tier requires no API key or local install โ add a URL to your IDE config and start generating. Paid tier adds video, audio, async jobs, and account tools.
An omnimodal AIGC framework that seamlessly converts ComfyUI workflows into MCP tools with zero code, enabling full-modal support for Text, Image, Sound, and Video generation with Chainlit-based web interface.
A MCP server for creating GIFs from your videos.
MCP server for ZapCap API providing video caption and B-roll generation via natural language
The first MCP server for Final Cut Pro. 53 tools that parse, edit, and generate FCPXML timelines โ health checks, flash frame detection, chapter markers, rough cuts, NLE export. 912 tests, MIT licensed.
Turn your self-hosted Immich photo library into a conversation โ natural language search via CLIP, geographic album curation from GPS data, cross-source duplicate detection with perceptual hashing, library health reports, and offline-ready interactive HTML galleries. Claude Code plugin with 21 MCP tools, 11 skills, and 5 slash commands.
AI podcast editing as a service. Upload raw audio or submit a URL, get back edited episodes with filler words removed, noise reduction, transcripts, show notes, and social clips. Includes webhooks for automation.
AI media generation (Flux, Veo, Suno) with cost control. Pre-commit pricing, budget enforcement, reserve-burn-refund billing.
Turn YouTube videos into short-form clips from any AI assistant. AI-powered highlight detection, subtitle generation, 9:16 reframing, and direct download URLs via the Crabcut API.
Full-resolution vision for LLMs. Tiles large images and captures web pages via Chrome CDP so vision models process every detail without downscaling. Generates interactive HTML tile previews. Supports Claude, OpenAI, Gemini presets with per-model token math and entropy-based tile classification.
Audio AI API for stem separation (vocal, drum, bass, guitar, piano), DME separation (dialogue, music, effects), and AI lyrics sync. 7 tools, 11 models, supports WAV/FLAC/MP3/M4A/MOV/MP4.
Generates app icons, favicons, OG images, logos, and wordmarks. Routes each request across 30+ image models. Runs without an API key via Cloudflare Workers AI, NVIDIA NIM, HuggingFace, or Stable Horde. Three modes: inline SVG, external prompt-only, or full API. Validates contrast, OCR text accuracy, and palette before returning.
Unified Gemini media generation: Nano Banana (images, editing, multi-reference composition), Veo 3.1 (video, image-to-video, extend), TTS, and Lyria 3 (music with vocals). Single Go binary, 12 tools, supports Gemini API key and Vertex AI.
MCP server for video analysis โ extracts transcripts, key frames, OCR text, and annotated timelines from video URLs. Supports Loom and direct video files (.mp4, .webm). Zero auth required.
AI 3D model generation and post-processing: text/image/multiview-to-3D via Tripo, plus retopology, format conversion (GLB/FBX/OBJ/STL/USDZ), and stylization. Single Go binary, 10 tools, async generation with polling.
Video editing MCP server with 26 tools for trimming, merging, text overlays, audio sync, filters, color grading, audio normalization, picture-in-picture, split-screen, batch processing, format conversion, subtitles, watermarks, and more. 380 tests, CI on Python 3.11+3.12, progress callbacks, works with Claude Code, Cursor, and any MCP client.
A MCP server that allows one to examine image metadata like EXIF, XMP, JFIF and GPS. This provides foundation for LLM-powered search and analysis of photo librares and image collections.
TypeScript MCP server for OpenAI Images/Videos and Google GenAI (Veo) media generation, editing, and asset downloads.
Comprehensive Sonos audio system control through pure TypeScript implementation. Features complete device discovery, multi-room playback management, queue control, music library browsing, alarm management, real-time event subscriptions, and audio EQ settings. Includes 50+ tools for seamless smart home audio automation via UPnP/SOAP protocols.
Agent-native media processing via Transloadit's 86+ Robots: video encoding (HLS, H.264, VP9), image manipulation (resize, watermark, smart crop), document conversion, OCR, speech transcription, and more. Hosted or self-hosted via npx.
Search and discover movies and TV shows with torrent links, quality scoring, streaming availability, and cast/crew metadata.
Using ffmpeg command line to achieve an mcp server, can be very convenient, through the dialogue to achieve the local video search, tailoring, stitching, playback and other functions
Comprehensive video and audio editing MCP server with advanced operations including trimming, merging, effects, overlays, format conversion, audio processing, YouTube downloads, and smart memory management for chaining operations without intermediate files
AI image enhancement (upscaling, denoising, sharpening) via Topaz Labs API. Supports 8 models including Standard V2, Wonder 2, Bloom, and Recover 3.
Headphone/IEM equalization database with 8,800+ models from AutoEQ. Search by name or sound signature, get parametric EQ settings, compare headphones band-by-band, and browse Harman preference score rankings. Includes automatic sound signature classification (Neutral, Warm, Bright, Dark, V-shaped, etc.).