Overview
GPTBot is OpenAI's web crawler that collects content from the public web for training and improving GPT models, including ChatGPT. It visits publicly accessible pages, reads their content, and processes it as potential training data.
GPTBot is one of the most discussed web crawlers due to the debate around AI training data. Many websites choose to block it via robots.txt, while others allow it for visibility in AI-generated responses. The decision to block or allow GPTBot has become a strategic choice for content publishers.
GPTBot is separate from OAI-SearchBot, which powers ChatGPT's real-time web search feature. They serve different purposes and can be controlled independently via robots.txt.
User-Agent String
GPTBot identifies itself with the following user-agent string:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)How GPTBot Handles OG Images
GPTBot is a content crawler, not a link preview bot. It reads page content for AI training — OG images are not its primary focus.
Meta Tags Read
og:titleog:description| Preferred Size | N/A — GPTBot focuses on text content, not images |
| Cache Duration | Unknown. OpenAI does not disclose crawl frequency or caching behavior. |
Caching Behavior
Unknown. OpenAI does not disclose crawl frequency or caching behavior.
Fallback Behavior
GPTBot reads the full page content regardless of OG tags. OG tags may help it understand the page's topic but are not required.
Things to Know
- GPTBot is specifically for AI training data collection — separate from OAI-SearchBot which powers ChatGPT search.
- Blocking GPTBot does not prevent your content from appearing in ChatGPT responses if it was already crawled.
- Many major publishers (NYT, Reuters, etc.) block GPTBot via robots.txt.
- OpenAI provides a published IP range for GPTBot that you can use for server-level blocking.
robots.txt
Fully respects robots.txt. OpenAI encourages site owners to use robots.txt to control GPTBot access. Many sites choose to block it.
# To allow GPTBot:
User-agent: GPTBot
Allow: /
# To block GPTBot:
User-agent: GPTBot
Disallow: /How to Test
OpenAI does not provide a debugger tool. Check your server access logs for the GPTBot user-agent to see if it's crawling your site. OpenAI also publishes the IP ranges used by GPTBot.
- Check your server logs for requests with 'GPTBot' in the user-agent string.
- Review your robots.txt to see if GPTBot is already being blocked or allowed.
- Consider your content strategy: allowing GPTBot may increase visibility in AI responses, while blocking it protects your content from being used as training data.
You can also use the MyOG OG Preview tool to check how your OG tags are configured before testing with GPTBot.
FAQ
What is GPTBot?
GPTBot is OpenAI's web crawler that collects publicly accessible content from the web for training and improving GPT models, including ChatGPT. It respects robots.txt and can be blocked by site owners who don't want their content used for AI training.
Should I block GPTBot?
It depends on your goals. Blocking GPTBot prevents your content from being used as future training data for GPT models. Allowing it may increase the chance your content is referenced in AI-generated responses. Many publishers block it, while others see value in AI visibility. This is a strategic decision.
What is the difference between GPTBot and OAI-SearchBot?
GPTBot collects training data for improving GPT models. OAI-SearchBot powers ChatGPT's real-time web search feature — it fetches pages when users ask ChatGPT to search the web. They serve different purposes and can be controlled independently via robots.txt.
Related Bots
Test Your OG Images
Check how your Open Graph images appear to bots and crawlers. Preview your link cards before sharing.
Already have an account?