Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”
Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”
Well Reddit should just sue these companies and see if these companies are actually breaking any laws. Holding sizeable chunk of the internet hostage also sounds like something the EU and US might want to look in to as it very much sounds like anti-competitive conduct or market manipulation.
Also if these companies want to have greater ownership over the content generated by their users they should also be much more liable for the content posted to their sites. I mean when something like the Section 230 was written they probably did not take this in to account. If these companies want to start selling user generated content then they should simply lose the immunity from liability.
Reddit would lose badly that’s why they don’t sue. US’ 9th circuit ruled that scraping Linkedin is legal and Bing is not even scraping but indexing the data. Easiest case ever.
It’s almost impossible to block web scraping especially someone with Microsoft or Perplexity resources.
Its clearly an attempt to blackmail indexers into license deal as paying something to reddit could be actually cheaper than battling anti robots.
why do people insist on making me defend reddit.
While I don’t disagree with the general idea, Section 230 would introduce an uncontrollable risk into running any website with user-generated content and would essentially shut them down.
If the site isn’t selling data, they wouldn’t lose 230 protection. So that would only be a risk for the companies selling their users’ data, not your regular forum or something.
That gets really murky though. For example:
It’s a lot more obvious for social media sites like Facebook since user-generated content is the service, but there are a lot of for-profit entities where user-generated content is highly relevant, but not the core service. Would those sites be essentially forced to either moderate or eliminate user interaction?
There’s a lot of complexity here.