Cloudflare launches a tool to combat AI bots

4 months ago 34
ARTICLE AD

Cloudflare, the publicly-traded cloud service provider, has launched a new, free tool to prevent bots from scraping websites hosted on its platform for data to train AI models.

Some AI vendors, including Google, OpenAI and Apple, allow website owners to block the bots they use for data scraping and model training by amending their site’s robots.txt, the text file that that tells bots which pages they can access on a website. But, as Cloudflare points out in a post announcing its bot-combatting tool, not all bots respect this.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

So, in an attempt to address the problem, Cloudflare analyzed AI bot and crawler traffic to fine-tune an automatic bot detection model. The model considers, among other factors, whether an AI bot might be trying to evade detection by mimicking the appearance and behavior of someone using a web browser.

“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”

Cloudflare has set up a form for hosts to report suspected AI bots and crawlers and says that it’ll continue to manually blacklist new AI bots over time.

The problem of AI bots has come into sharp relief as the generative AI boom fuels the demand for AI model training data.

Many sites, wary of AI vendors training models on their content without alerting or compensating them, have opted to block AI scrapers. Around 26% of the top 1,000 sites on the web have blocked OpenAI’s bot, according to one study; another found that more than 600 major news publishers had blocked the bot.

Blocking isn’t surefire, however. As alluded to earlier, some vendors appear to be ignoring standard exclusion rules to gain a competitive advantage. AI search engine Perplexity was recently accused of impersonating legitimate visitors to scrape content from websites.

Tools like Cloudflare’s could help — but only if they prove to be accurate in detecting clandestine AI bots.

Read Entire Article