- The Wayback Machine is once again threatened by AI
- The rise of AI has tripled the price of the large hard drives needed for this expansive web archive.
- This is another danger for the Wayback Machine, which also has problems due to news sites blocking its web crawler, which is also due to AI.
It’s an increasingly desperate time for those trying to keep track of the history of the web, as AI once again proves to be a serious obstacle to the efforts made by companies like the Internet Archive – and this time it’s the rising prices of hard drives.
You may remember that last month we covered another angle on the difficulties AI has been causing for the Internet Archive’s Wayback Machine. This is the story of the nonprofit’s web, and there’s a problem in that, as part of measures designed to prevent AI from removing their content, online news sites are increasingly blocking the web crawler that the Internet Archive uses to compile snapshots of the web pages that make up the archive.
And now, 404 Media reports (via Tom’s Hardware) that Internet Archive is suffering due to AI-induced hard drive shortages (as larger drives are needed in data centers for AI workloads).
Yes, the rise of AI is not just about LLMs (large language models) eating up RAM and SSDs, but also hard drives (as well as indirect effects on other components).
The huge hard drives (on the order of 30TB) that the Internet Archive needs to house the Wayback Machine’s historical record are now up to three times more expensive, or in fact completely out of stock. In this way, the rise of AI is now a “very real problem that costs us time and money,” Internet Archive founder Brewster Kahle told 404 Media.
With about 210 petabytes (210,000 TB) of web page snapshots in its library, which expands by 100 TB daily, you can see the extent of the web archiving done here.
The Wikimedia Foundation, Wikipedia’s parent organization, is reportedly facing similar struggles, as you might imagine. It has about 65 million items to host, which takes up a lot of disk space. A spokesperson for the Wikimedia Foundation told 404 Media that the main problems are the “purchase of memory and hard drives”, but also the delivery times of the servers.
Analysis: There are many workarounds, but what about tape?
So is the Wayback Machine really in danger? Will we see the “living history of the Internet” begin to unravel? Well, there is no immediate danger, as donors and the community around the Wayback Machine are apparently coming together to solve the problem of rising unit costs.
Still, this is clearly a concern going forward, and the blocking of the Internet Archive web crawler is even more so. The problem is that news sites are blocking AI scraping, but those blocks can be bypassed if the owner of the AI directs the content through the Wayback Machine. It’s a thorny issue, but talks continue and hopefully both sides can reach some kind of resolution.
And as far as disk goes, if you’re wondering why Internet Archive can’t switch to tape as a storage medium, the problem is that it’s a “live” archive of the web, as if it were online, for people to access those snapshots of web pages on demand. As such, hard drives are needed for that access to respond. The tape simply isn’t up to par in terms of performance in this case.
The Internet Archive uses tape, sure, for long-term content backups, but it’s only part of the puzzle in that regard. Hard drives are vital to the actual day-to-day operation of the Wayback Machine as we know it, in terms of being able to quickly provide users with the content they need online.

The best laptops for all budgets
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds.




