Reddit Files Copyright Lawsuit Against AI Firm Perplexity Over Data Scraping Allegations

Legal Action Against AI Search Engine

Social media platform Reddit has filed a copyright lawsuit against artificial intelligence company Perplexity, according to court documents filed in New York federal court. The legal complaint alleges that Perplexity illegally scraped Reddit’s data to train the model powering its AI search engine, marking the latest in a series of legal confrontations between AI companies over copyrighted material.

Legal Action Against AI Search Engine
Multiple Defendants Named in Complaint
Defense and Previous Negotiations
Broader Context of AI Copyright Litigation
Contrasting Data Access Methods

Multiple Defendants Named in Complaint

Sources indicate that Reddit’s legal action extends beyond Perplexity to include three additional entities: Lithuanian data scraping service Oxylabs UAB, former Russian botnet operation AWMProxy, and Texas-based startup SerpApi. The complaint alleges these groups provided data-scraping services that collected copyrighted Reddit content while allegedly concealing their identities, locations, and disguising web scrapers as regular human users.

According to the filing, Reddit claims Perplexity was “a willing customer of at least one of its co-defendants” and reportedly needed to “fuel its answer engine” by scraping data through Google search results. The social media company‘s chief legal officer Ben Lee stated in the complaint that “AI companies are locked in an arms race for quality human content,” suggesting this competition has fueled what he described as an industrial-scale “data laundering” economy.

Defense and Previous Negotiations

SerpApi has publicly responded to the allegations, with company representatives stating they “strongly disagree with Reddit’s allegations and intend to vigorously defend ourselves in court.” Meanwhile, two people familiar with the matter told the Financial Times that Reddit had previously confronted Perplexity about the alleged data theft and reportedly suggested entering discussions about a paid partnership, but that Perplexity founder Aravind Srinivas showed no interest.

The same sources indicated that Reddit had also contacted Google with its concerns, asking the technology giant to investigate whether Perplexity was scraping Reddit’s proprietary data through its search engine and, if confirmed, to determine methods to prevent such activity. A Google spokesperson reportedly declined to comment on the matter.

Broader Context of AI Copyright Litigation

Analysts suggest this lawsuit adds to dozens of copyright cases filed against AI companies since the emergence of generative AI systems. These technologies typically require training on vast amounts of text data, including content sourced from the internet. Copyright holders across multiple industries have claimed their content has been used without proper consent or fair compensation.

Reddit, which completed its initial public offering in March 2024 and is known for hosting extensive online communities, has previously established multimillion-dollar partnerships with both Google and OpenAI. These agreements reportedly allow the technology companies to train their large language models on Reddit content through official channels.

Contrasting Data Access Methods

The report states that Reddit alleges the defendants circumvented data protection measures to obtain copyrighted material without permission, contrasting this approach with the company’s authorized partnerships. Lee emphasized that Reddit represents “one of the largest and most dynamic collections of human conversation ever created,” making it what he characterized as a prime target for unauthorized data collection.

This legal action follows a similar lawsuit Reddit filed against AI startup Anthropic in June, alleging that company had scraped its platform more than 100,000 times since July 2024. At that time, Anthropic responded that it “disagreed” with Reddit’s claims and would “defend ourselves vigorously.”

Perplexity and Oxylabs did not immediately respond to requests for comment regarding the current lawsuit, while AWMProxy could not be reached for comment, according to reports.

Reddit continues to navigate the complex intersection of artificial intelligence development and content rights as the industry grapples with appropriate data usage standards for training search engines and other AI systems.