Blog Post

Prmagazine > News > News > Wikipedia is struggling with voracious AI bot crawlers
Wikipedia is struggling with voracious AI bot crawlers

Wikipedia is struggling with voracious AI bot crawlers

Wikimedia owns See 50% The foundation said in its update that the bandwidth used to download multimedia content since January 2024. But this is not because human readers suddenly develop a crazi appetite to eat Wikipedia articles, watch videos, or download files from Wikimedia Commons. No, the spikes in usage come from AI Crawleror automated programs scratch Wikimedia’s publicly licensed images, videos, articles and other files to train generated AI models.

A sudden increase in bot traffic may slow access to pages and assets entering Wikimedia, especially during high interest events. For example, when Jimmy Carter died in December, there was a higher interest in the videos he debated with President Ronald Reagan, giving some users a slow page loading time. Wikimedia has the ability to maintain traffic spikes for human readers in such activities, and users who watch Carter videos should not cause any problems. But “the amount of traffic generated by scraper robots is unprecedented and brings increased risks and costs,” Wikimedia said.

The foundation explains that human readers tend to look for specific and often similar topics. For example, many people look up when trending. Wikimedia creates a cache of content requested multiple times in the data center closest to the user, allowing it to serve content faster. However, articles and content that have not been accessed for some time must be provided from the core data center, which consumes more resources and therefore Wikimedia spends more money. Since AI crawlers tend to read pages in bulk, they access obscure pages that must be provided from the core data center.

Wikimedia said that when you look closely, 65% of the resources it consumes comes from robots. It has caused constant damage to its site reliability team, which has to stop crawlers all the time before it can significantly slow down the pages to access actual readers. As Wikimedia said, the real problem now is that “scaling happens largely without sufficient attribution, which is key to driving new users to participate in the movement.” A foundation that relies on people to donate to continue running requires attracting new users and allowing them to take care of their cause. “Our content is free, our infrastructure is not,” the foundation said. Wikimedia is now looking for a sustainable way to access its content for developers and Reusers in the upcoming fiscal year. It has to do this because it has no indication that AI-related traffic will slow down anytime soon.

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback