AI CrawlerยทAnthropic

ClaudeBot

ClaudeBot

Anthropic's web crawler used to gather training data for Claude AI models.

Overview

ClaudeBot is Anthropic's web crawler that collects publicly accessible content to train and improve Claude, Anthropic's AI assistant. It visits web pages, reads their content, and processes it as potential training data for Claude's language models.

Like OpenAI's GPTBot, ClaudeBot has become part of the broader conversation about AI training data and web crawling ethics. Anthropic is transparent about the crawler's purpose and encourages site owners to use robots.txt to control access.

ClaudeBot is designed to be a responsible crawler โ€” it respects robots.txt, is thoughtful about crawl frequency, respects Crawl-delay directives, and identifies itself clearly in the user-agent string. It crawls publicly accessible content only and does not bypass paywalls or authentication.

User-Agent String

ClaudeBot identifies itself with the following user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

How ClaudeBot Handles OG Images

ClaudeBot is a content crawler for AI training, not a link preview bot. It reads page content rather than focusing on OG images.

Meta Tags Read

og:titleog:description
Preferred Size N/A โ€” ClaudeBot focuses on text content, not images
Cache Duration Unknown. Anthropic does not disclose crawl frequency or caching behavior.

Caching Behavior

Unknown. Anthropic does not disclose crawl frequency or caching behavior.

Fallback Behavior

ClaudeBot reads the full page content. OG tags may help it understand the page topic but are not required.

Things to Know

  • ClaudeBot is thoughtful about crawl frequency and respects Crawl-delay directives where appropriate.
  • Anthropic publishes the IP ranges used by ClaudeBot for server-level access control.
  • Blocking ClaudeBot does not remove previously crawled content from Claude's training data.
  • For best results, ensure your content is available in the initial HTML rather than relying on client-side JavaScript rendering.

robots.txt

Fully respects robots.txt. Anthropic encourages site owners to use robots.txt to opt out of ClaudeBot crawling.

# To allow ClaudeBot:
User-agent: ClaudeBot
Allow: /

# To block ClaudeBot:
User-agent: ClaudeBot
Disallow: /

How to Test

Anthropic does not provide a debugger tool. Check your server access logs for the ClaudeBot user-agent to see if it's crawling your site.

  • Check your server logs for requests with 'ClaudeBot' in the user-agent string.
  • Review your robots.txt to confirm your intended policy for ClaudeBot.
  • If you want to block ClaudeBot at the server level, Anthropic publishes the IP ranges used by the crawler.

You can also use the MyOG OG Preview tool to check how your OG tags are configured before testing with ClaudeBot.

FAQ

What is ClaudeBot?

ClaudeBot is Anthropic's web crawler that collects publicly accessible content from the web for training Claude AI models. It respects robots.txt and uses conservative rate limiting to minimize impact on servers.

Should I block ClaudeBot?

This is a strategic decision. Blocking ClaudeBot prevents future training data collection from your site. Allowing it may improve Claude's knowledge of your domain. Some publishers block all AI crawlers, while others selectively allow specific ones. Check your content strategy and any legal or policy considerations.

Is ClaudeBot the same as Claude?

No. ClaudeBot is the web crawler that collects training data. Claude is the AI assistant built by Anthropic. ClaudeBot's crawling contributes to training Claude, but they are separate systems. Blocking ClaudeBot doesn't affect your ability to use Claude.

Related Bots

Test Your OG Images

Check how your Open Graph images appear to bots and crawlers. Preview your link cards before sharing.

Already have an account?

0f1a90ac09aeca1541e66cc7c007380eee2e55f3