Search & Data Extraction MCP Servers
130 MCP servers in the search & data extraction category. Click any server for install commands, Claude Code setup, and GitHub source.
Search and crawl in one API
[Vectorize](https://vectorize.io) MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.
Obtains latest dependency details for Clojure libraries.
Google News search capabilities with automatic topic categorization and multi-language support via SerpAPI integration.
GXtract is a MCP server designed to integrate with VS Code and other compatible editors (documentation: [sascharo.github.io/gxtract](https://sascharo.github.io/gxtract)). It provides a suite of tools for interacting with the GroundX platform, enabling you to leverage its powerful document understanding capabilities directly within your development environment.
Fast, token-efficient web content extraction that converts websites to clean Markdown. Features Mozilla Readability, smart caching, polite crawling with robots.txt support, and concurrent fetching with minimal dependencies.
High-quality screenshot capture optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks (1.15 megapixels) with configurable viewports and wait strategies for dynamic content.
Kagi search API integration
Web search server that integrates Perplexity Sonar models via OpenRouter API for real-time, context-aware search with citations
A Model Context Protocol Server for [SearXNG](https://docs.searxng.org)
YouTube transcript extraction for AI agents. Clean text, timestamps, or structured JSON from any video. No API keys required. Install via `npx rippr-mcp`.
npx rippr-mcpAn MCP server for searching job listings with filters for date, keywords, remote work options, and more.
Audit URLs for AI crawler readiness β checks robots.txt, llms.txt, JSON-LD schema, and content density with 0-100 AEO scoring.
Web search using free multi-engine search (NO API KEYS REQUIRED) β Supports Bing, Baidu, DuckDuckGo, Brave, Exa, and CSDN.
Google SERP search including web, images, news, maps, places, videos, and knowledge graph results via Ace Data Cloud API.
Crawl websites into clean Markdown, search pages, and extract structured data with LLMs. Built-in MCP server for web research and RAG pipelines.
A MCP server for taking screenshots of webpages to use as feedback during UI developement.
MCP for LLM to search and read papers from arXiv
MCP to search and read medical / life sciences papers from PubMed.
Search articles using the NYTimes API
An MCP server for Apify's open-source RAG Web Browser Actor to perform web searches, scrape URLs, and return content in Markdown.
Search 800 000+ Polish public tenders (BZP + TED). Profiles of procuring entities and contractors by NIP, market statistics by CPV/province, 90+ term procurement glossary.
Multi-provider search broker with automatic fallback, RRF ranking, content extraction, and budget enforcement.
Pay-per-use web research for AI agents on Apify. Search (Brave + DuckDuckGo), fetch pages to clean markdown, and multi-step research with relevance scoring and key fact extraction.
Search ArXiv research papers
Model Context Protocol Server for looking up company ethics information. Learn about the ethical and unethical actions of major companies.
Web search capabilities using Brave's Search API
A comprehensive MCP server that enables LLMs to explore and interact with the Fediverse through ActivityPub protocol. Features WebFinger discovery, timeline fetching, instance exploration, and cross-platform support for Mastodon, Pleroma, Misskey, and other ActivityPub servers.
Modern, cross-platform MCP server enabling AI assistants to browse and interact with both Gopher protocol and Gemini protocol resources safely and efficiently. Features dual protocol support, TLS security, and structured content extraction.
Unsplash photo search with proper attribution. Returns ready-to-use attribution text and HTML for each photo, making it easy for LLMs to build content pages with properly credited images. Includes search, random photos, and download tracking.
MCP server that captures webpage screenshots, with viewport or full-page options and base64 PNG output.
This is a Python-based MCP server that provides OpenAI `web_search` built-in tool.
β Crawleo Search & Crawl API
Work with Kagi *without* API access (you'll need to be a customer, tho). Searches and summarizes. Uses Kagi session token for easy authentication.
Enable fast, free real-time web search and access premium data from trusted media brandsβnews, financial markets, sports, entertainment, weather, and more. Build powerful AI agents with Dappier.
Local MCP server for searching 300,000+ foods, nutrition facts, and barcodes from the OpenNutrition database.
Crawl, embed, chunk, search, and retrieve information from datasets through [Trieve](https://trieve.ai)
Fast domain availability aggregator with pricing. Checks Porkbun, Namecheap, GoDaddy, RDAP & WHOIS. Includes bulk search, registrar comparison, AI-powered suggestions, and social media handle checking.
MCP server for Google Search Console & Indexing API β 13 tools for search analytics, sitemaps, URL inspection, and batch indexing.
Full domain lifecycle management: availability checking (zero config), registration, DNS, SSL, email auth (SPF/DKIM/DMARC), and WHOIS across Porkbun, Namecheap, GoDaddy, and Cloudflare. 21 tools.
Official remote MCP server for Muumuu Domain (GMO Pepabo). Search and register domains, manage owned domains and contracts, and configure DNS records via natural language.
Access data, web scraping, and document conversion APIs by [Dumpling AI](https://www.dumplingai.com/)
Collection of B2B sales intelligence MCP servers. Includes website analysis, tech stack detection, hiring signals, review aggregation, ad tracking, social profiles, financial reporting and more for AI-powered prospecting by [Ekas](https://ekas.io/)
Plays [MelrΕse](https://melrΕse.org) music expressions as MIDI
Decompose text into classified semantic units with authority, risk, attention scores, and entity extraction. No LLM. Deterministic. Works as MCP server or CLI.
An MCP server to search Hacker News, get top stories, and more.
Search 1M+ enriched job listings from 20,000+ companies. Filter by skills, salary, location, seniority, remote type, and more. Free β 500 calls/day, no signup required. Also available as a remote MCP server at `https://mcp.jobdatalake.com`.
Search Fiverr gigs, view seller profiles, compare pricing packages, and read reviews. No API key required.
β MCP server that transcribes YouTube videos to text. Uses yt-dlp to download audio and OpenAI's Whisper-1 for more precise transcription than youtube captions. Provide a YouTube URL and get back the full transcript splitted by chunks for long videos.
Zero-cost, privacy-first universal web search MCP server. Enforces a **Search-First** paradigm β instructs LLMs to retrieve real-time information before answering factual questions. Supports 10+ search engines (DuckDuckGo, Bing, Google, Brave, Wikipedia, Arxiv, YouTube, Reddit) and deep page browsing. No API key required.
a KTOR server/ MCP server written in Kotlin applying multi-agents schools in a flexible research system to be used with coding or for research any general case.
Pay-per-use semantic web search for AI agents. Powered by SearxNG, agents pay in sats via Lightning Network micropayments β no API keys required. Self-hosted with phoenixd.
AI-powered domain brainstorming, analysis, and availability checking via AgentDomainService.com. Generate creative domain names from descriptions, get AI scoring for brandability/memorability, and check real-time availability with pricing. No API keys required.
Remote MCP server providing structured data APIs for Google (Search, Maps, Trends, Flights), Amazon, Airbnb, Zillow, Yelp, and more. 40+ tools returns clean JSON data instead of browser automation or raw HTML scraping. Designed for AI agents requiring reliable hosted data access.
MCP to search through PapersWithCode API
) - A MCP server for Unsplash image search.
Access tens of thousands of remote job listings and company information. This public MCP server provides real-time access to Himalayas' remote jobs database.
An integration that allows Claude Desktop to interact with Hacker News using the Model Context Protocol (MCP).
A Model Context Protocol (MCP) server that enables Claude Desktop to check domain availability across 50+ TLDs. Features DNS/WHOIS verification, bulk checking, and smart suggestions. Zero-clone installation via uvx.
Model Context Protocol Server for aggregating RSS feeds in Claude Desktop.
RSS feeds MCP server with 8 tools β fetch, filter, search, and manage RSS feeds by category or source. Zero config, no API keys required.
MCP server that provides comprehensive WHOIS lookup capabilities using the IP2WHOIS API. This server allows AI agents to query domain registration details, including expiry dates, registrar information, and registrant data.
MCP server for Naver Search API integration, supporting blog, news, shopping search and DataLab analytics features.
MCP server for fetching web page content using Playwright headless browser, supporting Javascript rendering and intelligent content extraction, and outputting Markdown or HTML format.
A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously.
Integrate AI assistants with Overseerr and the Seerr (the unified successor) for automated media discovery, requests, and management in Plex, Jellyfin, and Emby ecosystems.
An MCP server for searching and downloading royalty-free stock photography from Pexels and Unsplash. Features multi-provider search, rich metadata, pagination support, and async performance for AI assistants to find and access high-quality images.
Convert any URL to LLM-ready Markdown via real Chrome browsers. 3 tools: scrape, crawl, search. Free via MCP, pay-per-use via x402. Remote MCP endpoint: `https://anybrowse.dev/mcp`
β Stop bloating your LLM context. Query & Extract only what you need from your JSON files.
Extracts clean web content for RAG and provides Q&A about web pages.
β Tavily AI search API
Web search capabilities using Microsoft Bing Search API
Real-time Korean web data β Naver place reviews, Melon music chart, Daangn/Bunjang marketplace listings, Naver news, Musinsa fashion rankings. 7 tools powered by Apify actors. Requires APIFY_TOKEN.
Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
MCP server that lets AI assistants control LinkedIn accounts and retrieve real-time data.
MCP server for MinerU document parsing API. Parse PDFs, images, DOCX, and PPTX with OCR (109 languages), batch processing (200 docs), page ranges, and local file upload. 73% token reduction with structured output.
Discover, extract, and interact with the web - one interface powering automated access across the public internet.
Search marketplaces (TCGPlayer, Reverb, Thumbtack), verify professional licenses (contractor, nurse across US states), and look up PSA card grading population data.
Web, Image, News, Video, and Local Point of Interest search capabilities using Brave's Search API
Ultra-fast web fetcher and MCP server with HTTP/3, JS rendering, anti-fingerprinting, browser cookie auth, and 1Password integration. Fetches any URL as clean Markdown for AI context.
Efficient web content fetching and processing for AI consumption
Search Google and do deep web research on any topic
Real-time world news for AI agents β events clustered from hundreds of sources, classified by topic and geography, ranked by importance. Free, no API key. `npx -y @newsmcp/server`
npx -y @newsmcp/serverWeb search (embedded SearXNG), content extraction, and library docs indexing with hybrid search (FTS5 + semantic). Built-in Qwen3 embedding, no API keys required.
Web search using DuckDuckGo
"primitive" RAG-like web search model context protocol (MCP) server that runs locally. No APIs needed.
Specialized MCP server for cryptocurrency project documentation management with multi-blockchain support (Ethereum, BSC, Polygon, Solana).
Web intelligence MCP server for AI agents. 7 tools for SERP analysis, competitor research, market trends, content gap analysis, keyword insights, audience discovery, and citation tracking. Install via `pip install scout-intel-mcp`.
pip install scout-intel-mcpOfficial MinerU document parsing MCP ([mineru-open-mcp](https://pypi.org/project/mineru-open-mcp/) on PyPI). Converts PDFs, doc/docx/ppt/pptx, images, and spreadsheets to Markdown via the [MinerU](https://mineru.net) API; free Flash mode without an API key (about 20 pages per file); optional `MINERU_API_TOKEN` for higher limits.
PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback). Per-page confidence scoring flags low-quality pages and auto-reextracts them β prevents silent RAG failures. Zero config: `pip install pdfmux`. MIT licensed.
pip install pdfmuxHighest Accuracy Web Search for AI
Unified web layer for AI agents. Search (8 engines), stealth browse, cookie auth, and act on 24 platforms. 5,000 free searches/month via Gemini Grounded Search.
Highest Accuracy Deep Research and Batch Tasks MCP
Search GOV.UK content, retrieve full government pages, look up organisations, and resolve UK postcodes to local authorities. 5 read-only tools, no API keys required.
Natural language API discovery β search 700+ API capabilities, get endpoints, auth setup, and code snippets. Supports auto-discovery of new APIs.
Advanced search and retrieval for web crawler data. Supports WARC, wget, Katana, SiteOne, and InterroBot crawlers.
German public procurement data (OCDS) β semantic search, tender matching with company profiles, and structured filtering.
Unofficial MCP server for searching and retrieving scientific data from the Catalysis Hub database, providing access to computational catalysis research and surface reaction data.
Access Dutch Parliament (Tweede Kamer) information including documents, debates, activities, and legislative cases through structured search capabilities (based on opentk project by Bert Hubert)
MCP server providing OpenAI/Perplexity-like autonomous deep research, structured query elaboration, and concise reporting.
An MCP server lets AI assistants use the Wolfram Alpha API for real-time access to computational knowledge and data.
Search products and stores in nearby physical stores. Find what you need locally instead of waiting for delivery. Remote MCP server (Streamable HTTP, no API key required).
Multi-source search across code registries (GitHub, npm, PyPI), academic indexes (arXiv, Semantic Scholar), social platforms (HN, Reddit, X), and community blogs (Dev.to, Hashnode, Qiita, Zenn). Parallel fetch with structured JSON output. `npx -y scout-cli`.
npx -y scout-cliMCP server for ScraperAPI web scraping with JavaScript rendering, geotargeting, premium proxies, and auto-parsing support.
B2B lead generation with 20+ tools including Apollo, Google Maps, email finder, email validator, mobile finder, skip trace, and ecommerce store data.
Official MCP server for managing Searchcraft clusters, creating a search index, generating an index dynamically given a data file and for easily importing data into a search index given a feed or local json file.
An MCP Server to connect to searXNG instances
OpenGraph.io API integration for extracting OG metadata, taking screenshots, scraping web content, querying sites with AI, and generating branded images (illustrations, diagrams, social cards, icons, QR codes) with iterative refinement.
SerpApi MCP Server for Google and other search engine results. Provides multi-engine search across Google, Bing, Yahoo, DuckDuckGo, YouTube, eBay, and more with real-time weather data, stock market information, and flexible JSON response modes.
Structure any document, query it like a database. Open-source extraction engine that turns any document into typed, schema-defined records, queryable in natural language from Claude, ChatGPT, Gemini, or any MCP client.
Free URL metadata extraction API (Open Graph, Twitter Cards, favicons, JSON-LD). No API key required.
Search and discover rescue dogs from European and UK organizations with AI-powered personality matching and detailed profiles.
Convert any URL to clean, token-efficient Markdown for AI agents. API-backed extraction with token counting, CSS selector support, and configurable caching via [StripFeed](https://www.stripfeed.dev).
Get the LaTeX source of arXiv papers to handle mathematical content and equations
Schema-validated document extraction with searchable workspace memory. Extract structured fields from PDFs, scans, images, and forms; AI agents can also search, filter, and query past extractions.
An MCP Server that retrieves and processes news data from the GeekNews site.
MCP server for scraping Hacker News, Bluesky, and Substack with x402 micropayment support. Tools: hn_search, bluesky_search, substack_search. $0.05/call via USDC on Base.
Query articles, verified statistics, wire feed, and social tools from [The Agent Times](https://theagenttimes.com), the AI-native newspaper covering the agent economy. 13 tools including search, comments, citations, and agent leaderboards. No API key required.
A MCP server that provides gene set enrichment analysis using the Enrichr API
β Tavily AI search API
A reliable MCP server for generating and managing screenshots, PDFs, and videos, performing AI-powered screenshot analysis, and extracting web content (Markdown, metadata, and HTML) via the [Urlbox](https://urlbox.com) API.
Comprehensive NCBI/PubMed literature search server with advanced analytics, caching, MeSH integration, related articles discovery, and batch processing for all life sciences and biomedical research.
Multi-provider web search with intelligent auto-routing (Serper, Tavily, Exa). Available via `uvx web-search-plus-mcp`.
uvx web-search-plus-mcpProduction-ready MCP server providing 13 tools for AI agents: web search, content extraction, screenshots, weather, finance, email validation, translation, news, GeoIP, WHOIS, DNS, PDF extraction, and QR code generation. 1,000 free calls/month, no setup required.
Smart web fetcher for AI agents with auto-escalation from HTTP to headless browser to stealth mode. Includes 9 MCP tools: fetch, search, crawl, map, extract, batch, screenshot, jobs, and agent. Achieved 100% success rate on a 30-URL benchmark.
MCP server that searches Baseline status using Web Platform API
This is a TypeScript-based MCP server that provides DuckDuckGo search functionality.
Comprehensive research tools including Google Search (web, news, images), web scraping with JavaScript rendering, academic paper search (arXiv, PubMed, IEEE), patent search, and YouTube transcript extraction.
MCP server that fetches YouTube video transcripts and optionally summarizes them. Supports multiple transcript formats (text, JSON, SRT, WebVTT), multi-language retrieval, and flexible YouTube URL parsing.
Querying network asset information by ZoomEye MCP Server