- A growing number of major news sites are blocking Wayback Machine.
- That reportedly includes 23 organizations that prevent their content from appearing in the archive.
- This is happening due to fears that the Wayback Machine is being exploited to mine AI content.
The Wayback Machine is under serious threat (and not for the first time), as a growing number of major news websites appear to be blocking the archiving system.
If you’re not familiar with the Wayback Machine, it’s run by the nonprofit Internet Archive and is essentially a time machine that preserves a history of the web (and more). This can be vital when it comes to historical research, for example, or tracking changes to websites.
As Wired reports (via 9 to 5 Mac), there is a growing trend of online news outlets blocking the web crawler that the Internet Archive uses to collect their snapshots. Some 23 big news sites are doing it now, according to Originality AI (which specializes in AI detection).
Article continues below.
This includes the New York Times (based on a Nieman Lab report) and USA Today, with Wired noting that the latter recently published a report on how U.S. Immigration and Customs Enforcement delayed releasing key information about the impact of detention policies. This was a piece that used the Wayback Machine extensively in its research.
The irony of USA Today using this data in this way and still blocking Wayback Machine’s access to its own content, which could potentially keep the news site honest in the future, is not lost on Wayback Machine director Mark Graham.
Graham told Wired: “They can pool their research into the story because the Wayback Machine exists. At the same time, they’re blocking access.”
Of course, if more and more organizations start blocking the Wayback Machine, then its ability to maintain a historical record of online content will be increasingly eroded.
Analysis: Blame the AI (again)
So why is this happening? It’s not about readers bypassing paid content using the Wayback Machine, in case you thought that was the issue at play. Would you be surprised to learn that it is actually AI, indirectly? Of course not, and predictably it seems the Internet Archive is caught up in the broad backlash against AI here.
What these news organizations say they object to is not the keeping of a historical record of their content, but the fact that this archive can be used by third-party AI companies to train their models (LLM).
As Wired notes, New York Times spokesperson Graham James said, “The problem is that Times content on the Internet Archive is being used by artificial intelligence companies in violation of copyright law to compete directly with us.”
In short, the concern for these companies is that they can block such AI scraping activities themselves, but this will continue to happen behind their backs via the Wayback Machine. It’s not just the mainstream media that has these concerns, but also social media platforms, particularly Reddit, which has blocked the Wayback Machine web crawler due to the exact same concerns.
While there are other potential sources and ways to scrape news content indirectly, the Wayback Machine is the most obvious target for rogue AI operators as it maintains a very extensive library of web history.
Therefore, this is a complex topic related to AI scraping and a lot of gray areas in terms of its legality. However, the effect on what is an important resource for controlling governments or media giants (and holding them accountable for what has been said in the past, or what has been removed from the web entirely in some cases) is clearly worrying.
Graham states that: “There is no doubt that the blanket shutdown of more of the public network is affecting society’s ability to understand what is happening in our world.”
A petition titled “Journalists applaud Internet Archive’s role in preserving public records” has been created and submitted with over 100 signatures from working journalists. Meanwhile, a dialogue between the Internet Archive and these news publishers is ongoing, so hope for a viable solution is not yet lost.

The best computers for all budgets
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.




