Publishers block Apple Intelligence from accessing their websites

Apple’s tool ‘Applebot-Extended’ allows website owners to prevent their data from being used for AI training

2 min readAug 30, 2024

Highlights:

Major publishers like The New York Times and Condé Nast have blocked Applebot-Extended from accessing their data for AI training.
Unlike Google’s open data approach, Apple is negotiating with publishers and offering substantial payments for data access.
The opt-outs reflect the current debates over the use of data for AI training, with many publishers seeking to protect their intellectual property.

Major publishers are opting out of allowing Apple’s AI training tool, Applebot-Extended, to access their content. The New York Times, Condé Nast, Financial Times, The Atlantic, and USA Today are among the publishers that have blocked Apple Intelligence from scraping their data. Social media platforms including Facebook and Instagram have also prevented Apple AI from scraping their data.

Apple introduced ‘Applebot-Extended’ as an upgrade to its original web-crawling bot. The new tool allows publishers to prevent Apple’s AI models from training their data while enabling basic web crawling for search purposes.

Apple claimed this move was designed to address concerns about intellectual property and data usage by offering publishers greater control over their content.

Using a simple text file, robots.txt, publishers can block AI companies from accessing their web content, ensuring that automated systems do not scrape data without their permission.

Several publishers are now using this file to block Apple’s AI crawler, Applebot-Extended. According to Wired 6% to 7% of high-traffic websites have blocked Applebot-Extended, while another study by Ben Welsh discovered that about 25% of sites take similar actions.

Some publishers are opting to negotiate significant licensing deals with Apple to control how their content is used in AI training and receive compensation.

This approach reflects the current trend among publishers to protect their content. In May, Reddit CEO Steve Huffman claimed that Microsoft has been using Reddit’s data to help improve its artificial intelligence models.

Companies including Perplexity and Anthropic have also been accused of scraping data without authorization.

Subscribe to get 1 email each week with the top 10 digital marketing headlines
www.thekeyword.co/newsletter
Join 6000+ Marketing Executives

Publishers block Apple Intelligence from accessing their websites

Apple’s tool ‘Applebot-Extended’ allows website owners to prevent their data from being used for AI training

Written by The keyword

No responses yet