GPTBot

Overview

GPTBot is OpenAI's web crawler that collects content from the public web for training and improving GPT models, including ChatGPT. It visits publicly accessible pages, reads their content, and processes it as potential training data.

GPTBot is one of the most discussed web crawlers due to the debate around AI training data. Many websites choose to block it via robots.txt, while others allow it for visibility in AI-generated responses. The decision to block or allow GPTBot has become a strategic choice for content publishers.

GPTBot is separate from OAI-SearchBot, which powers ChatGPT's real-time web search feature. They serve different purposes and can be controlled independently via robots.txt.

User-Agent String

GPTBot identifies itself with the following user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)

How GPTBot Handles OG Images

GPTBot is a content crawler, not a link preview bot. It reads page content for AI training — OG images are not its primary focus.

Meta Tags Read

og:titleog:description

Preferred Size	N/A — GPTBot focuses on text content, not images
Cache Duration	Unknown. OpenAI does not disclose crawl frequency or caching behavior.

Caching Behavior

Unknown. OpenAI does not disclose crawl frequency or caching behavior.

Fallback Behavior

GPTBot reads the full page content regardless of OG tags. OG tags may help it understand the page's topic but are not required.

Things to Know

GPTBot is specifically for AI training data collection — separate from OAI-SearchBot which powers ChatGPT search.
Blocking GPTBot does not prevent your content from appearing in ChatGPT responses if it was already crawled.
Many major publishers (NYT, Reuters, etc.) block GPTBot via robots.txt.
OpenAI provides a published IP range for GPTBot that you can use for server-level blocking.

robots.txt

Fully respects robots.txt. OpenAI encourages site owners to use robots.txt to control GPTBot access. Many sites choose to block it.

# To allow GPTBot:
User-agent: GPTBot
Allow: /

# To block GPTBot:
User-agent: GPTBot
Disallow: /

How to Test

OpenAI does not provide a debugger tool. Check your server access logs for the GPTBot user-agent to see if it's crawling your site. OpenAI also publishes the IP ranges used by GPTBot.

Check your server logs for requests with 'GPTBot' in the user-agent string.
Review your robots.txt to see if GPTBot is already being blocked or allowed.
Consider your content strategy: allowing GPTBot may increase visibility in AI responses, while blocking it protects your content from being used as training data.

You can also use the MyOG OG Preview tool to check how your OG tags are configured before testing with GPTBot.

FAQ

What is GPTBot?

GPTBot is OpenAI's web crawler that collects publicly accessible content from the web for training and improving GPT models, including ChatGPT. It respects robots.txt and can be blocked by site owners who don't want their content used for AI training.

Should I block GPTBot?

It depends on your goals. Blocking GPTBot prevents your content from being used as future training data for GPT models. Allowing it may increase the chance your content is referenced in AI-generated responses. Many publishers block it, while others see value in AI visibility. This is a strategic decision.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot collects training data for improving GPT models. OAI-SearchBot powers ChatGPT's real-time web search feature — it fetches pages when users ask ChatGPT to search the web. They serve different purposes and can be controlled independently via robots.txt.

Check what crawlers see

Paste your URL to test OG tags, image fetches, crawler access, and the preview card before you share it.

Run the Checker

Already have an account?