Home AI Reddit sues Anthropic over AI data scraping

Reddit sues Anthropic over AI data scraping

by ccadm


Reddit is accusing Anthropic of building its Claude AI models on the back of Reddit’s users, without permission and without paying for it.

Anyone who uses Reddit, even a web-crawling bot, agrees to the site’s user agreement. That agreement is clear: you cannot just take content from the site and use it for your own commercial products without a written deal. Reddit claims Anthropic’s bots have been doing exactly that for years, scraping massive amounts of conversations and posts to train and improve Claude.

What makes this lawsuit particularly spicy is the way it goes after Anthropic’s reputation. Anthropic has worked hard to brand itself as the ethical, trustworthy AI company, the “white knight” of the industry. The lawsuit, however, calls these claims nothing more than “empty marketing gimmicks”.

For instance, Reddit points to a statement from July 2024 where Anthropic claimed it had stopped its bots from crawling Reddit. The lawsuit says this was “false”, alleging that its logs caught Anthropic’s bots trying to access the site more than one hundred thousand times in the following months.

But this isn’t just about corporate squabbles; it directly involves user privacy. When you delete a post or a comment on Reddit, you expect it to be gone. Reddit has official licensing deals with other big AI players like Google and OpenAI, and these deals include technical measures to ensure that when a user deletes content, the AI company does too.

According to Reddit’s lawsuit, Anthropic has no such deal and has refused to enter one. This means if their AI was trained on a post you later deleted, that content could still be baked into Claude’s knowledge base, effectively ignoring your choice to remove it. The lawsuit even includes a screenshot where Claude itself admits it has no real way of knowing if the Reddit data it was trained on was later deleted by a user:

So, what does Reddit want? It’s not just about money, although they are asking for damages for things like increased server costs and lost licensing fees. They are asking the court for an injunction to force Anthropic to stop using any Reddit data immediately.

Furthermore, Reddit wants to prohibit Anthropic from selling or licensing any product that was built using that data. That means they’re asking a judge to effectively take Claude off the market.

This case forces a tough question: Does being “publicly available” on the internet mean content is free for any corporation to take and monetise? Reddit is arguing a firm “no,” and the outcome could change the rules for how AI is developed from here on out.

(Photo by Brett Jordan)

See also: Tackling hallucinations: MIT spinout teaches AI to admit when it’s clueless

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.



Source link

Related Articles