OpenAI published a way to disallow their crawler to access your website. To make this happen, the only thing you need to do is add a
robots.txt file in the root of your website with the following content:
User-agent: GPTBot Disallow: /
This will tell the GPTBot to not crawl your site, stopping them from training their models on your content. There is just a tiny bit of content on this blog so the impact is very limited. Still - I’ve decided to stop OpenAI from crawling my website. Why bother? Let me explain.
In all fairness: kudos to them for at least giving you a way to opt-out of their efforts to swallow up the internet. But the cynic in me can’t help but note that they only did this after they already went through everything to train their current models.
Although I’m aware that there are plenty more crawlers around that went through here and use the data on the internet for their own gain, there aren’t too many that crawl the internet and remix all their knowledge in a way that presents it as something an AI came up with. Which brings me to the next argument:
If you look at websites like Wikipedia or even Google there’s tons of knowledge to go through. But there’s always a way to figure out where the knowledge comes from. If you google something, you see the websites that host the data you searched for. On Wikipedia you can have a look at the sources for every claim that is made on a page. You can even see who wrote it down.
There’s no such thing when you ask ChatGPT. If you ask it for a good recipe for pea soup, it’ll throw something together based on all the pea soup recipes it
stole found. If you ask it some information on your favorite football team, it won’t tell you to have a look on their website.
Finally: when OpenAI started out - their mission as a non-profit was to make sure AI would be used for good. It’s still early, but if we make up the balance now it seems like that was a bit too superhero-y after all.
As they are now a for-profit company, it seems like the good they meant was the good of their own profit statement. Don’t get me wrong: I don’t mind a company making money. I’ve done that every now and then as well! But the way they make money by basically stealing other people’s content and art is just not cool. Bad Sam Altman, very bad!