- Perplexity that is seen ignoring signals such as robot.txt to scrape the online sites
- He even found protected and hidden test sites of Cloudflare
- Operai adheres to the responsible tracking, but the quiet perplexity for now
Cloudflare has accused the giant perplexity of the AI of scraping websites that did not explicitly allow robots.txt and other rules at the network level hiding its identity and performing a offered tracking activity.
The company researchers said they observed perplexity using multiple user agents, including one that passes through Google Chrome in Macos, as well as rotating IP addresses and ASN to evade detection.
Alarmingly, Cloudflare detected millions of daily applications in tens of thousands of domains, highlighting the illegitimate scraping scale of one of the largest companies in space.
Perplexity is scraping sites, it shouldn’t be
According to Cloudflare analysis, in many cases, perplexity ignored or did not obtain robots. TTXT files, which are text files without format placed at the root of a site to indicate automated agents (such as search engines, AI trackers and link verifiers) that URLs may or may not be obtained.
In revealing, perplexity also tried to access the test websites that Cloudflare created, although they were blocked through robots.txt and not publicly discovered, while using unst declared trackers that were not even associated with their official IP range.
“Although perplexity initially crawls their declared user agent, when a network block is presented, they seem to obscure their tracking identity in an attempt to avoid preferences of the website,” the researchers write.
In response to its findings, Cloudflare has eliminated perplexity bots from its verified bots. The company has also added new heuristics of administered rules to detect and block stealth tracking.
In contrast, OpenAi trackers have so far respected robots.
Perplexity denied irregularities, calling the cloudflare publication as a “sales argument”, adding that the identified bots were not even theirs. Techradar Pro He has requested perplexity for his comment.
Cloudflare urges Bot operators to respect the preferences of the website being transparent, being well -being, serving a clear purpose, using separate bots for separate activities and following rules and signals such as robots.txt.