Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

jumpingrabbit/robots.txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

4 Commits

Repository files navigation

robots.txt

A robots.txt that allows indexing for search crawlers, but disallows harmful AI training bots that can take your content for AI training without your consent (see example):

Sitemap: https://[your domain name here]/sitemap.xml
Sitemap: https://[your domain name here]/image-sitemap.xml

# Disallow data scraping and usage of website content for AI model training or prompting.
# Explicit opt-out from certain crawlers is not an invitation for others to train AI models on our content.
# Data scraping and model training must be opt-in, not opt-out.
# Demand consent, credit, and compensation.
# #CreateDontScrape

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: *
Allow: /
Disallow: /[Anything to exclude from indexing]
Host: https://[your domain name here without a closing slash]

Edit the above and place on your website as https://[your domain name here]/robots.txt .

Create an issue if you think more bots need to be added.

About

A robots.txt that disallows harmful AI training bots

Resources

Readme

License

Unlicense license

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors