Skip to content

Internet service provider Cloudflare restricts artificial intelligence programs from gathering data from websites without explicit authorization.

Publishing victory secured for media outlets.

Web content guard safeguarded against unauthorized AI bots' data-mining activities through...
Web content guard safeguarded against unauthorized AI bots' data-mining activities through Cloudflare's protective measures.

Internet service provider Cloudflare restricts artificial intelligence programs from gathering data from websites without explicit authorization.

Cloudflare, a leading internet infrastructure company, has announced a significant shift in its approach to web crawling. In a move that could reshape the future of the Web, the company will now require opt-in permission from site owners before allowing AI bots to scrape data from their sites.

This new policy comes as several major news publishers, including the Associated Press, Time, The Atlantic, and Reddit, have signed on with Cloudflare. The trend is unlikely to reverse anytime soon, and the interface of the future Web may look more like ChatGPT than a spartan search box and ten blue links.

Previously, website owners using Cloudflare could choose to block AI bots, but if they did nothing, crawlers were permitted to scrape the sites. However, this new policy will shift the default to require site owners to actively allow crawlers.

Cloudflare's pay-per-crawl experiment raises questions about how pricing tiers might be created for publishers. Bill Gross, an entrepreneur, argues that AI bots are shoplifting and should pay up, while Shayne Longpre, a PhD candidate at MIT, thinks that pushback against crawlers threatens the transparency and open borders of the Web. Major publications may potentially charge different prices for crawls compared to local newspapers.

Pay-per-crawl allows domain owners to set a flat, per-request price across their entire site. This could lead to a more equitable distribution of revenue from data scraping, as opposed to the current system where large tech companies often reap the majority of the benefits.

Not all crawlers are AI bots; some enhance security, archive webpages, or index them for search engines. Some AI crawlers have been found to ignore the Robots Exclusion Protocol, which is used to instruct crawlers on which parts of a website they're allowed to access. An analysis carried out by developer Robb Knight found that Perplexity, one such AI crawler, ignores robots.txt files, despite Perplexity claiming otherwise.

Cloudflare can accommodate benevolent bots by allowing domain owners to selectively bypass payment. The company can also impose a punishment on misbehaving bots by trapping them in an AI Labyrinth. Last year, Wired caught Perplexity trespassing on its website and those of other Conde Nast publications.

Shayne Longpre argues that raising a drawbridge to crawlers could shrink the internet's biodiversity. He believes that the openness of the Web is essential for innovation and progress. On the other hand, Bill Gross contends that AI bots are stealing content and should be made to pay for it.

The pay-per-crawl experiment is still in its early stages, and it remains to be seen how it will impact the Web. However, it is clear that Cloudflare's new policy marks a significant milestone in the relationship between AI, the internet, and the publishers that populate it.

As for which companies have signed up for Cloudflare's pay-per-crawl program, no publicly available details have been released as of yet. Cloudflare hosts approximately 20% of the Web, so the impact of this policy could be far-reaching. The future of web crawling may indeed look very different.

Read also:

Latest