- AI companies such as Anthropic are heavily crawling websites, offering little referral traffic.
- Historically, tech firms exchanged data access for web traffic, but AI disrupts this balance.
- Cloudflare tracks crawl-to-refer ratios, revealing AI's impact on web traffic and costs.
I was at a party in San Francisco recently where a classic Silicon Valley discussion began.
The topic was which AI models and chatbots are best to use. For some partygoers, an important filter was how "ethical" the providers were.
One person said they planned to use Anthropic's Claude service because they said the startup was ethical. This company has done impressive work in the area of AI safety. I mentioned that Anthropic has bot crawlers that seem to regularly scrape data from websites while sending very little traffic back. This partygoer was shocked.
Since then, I've been looking for reliable data that shows this important and under-discussed part of the AI revolution. While tech companies spend lavishly on data centers, GPUs, and talent, they avoid talking about the other key ingredient of AI success: data.
That's because they don't want to pay for the high-quality human data that's needed for AI model training, inference, and AI outputs. Instead, they send out bots to crawl websites and scoop up this information, mostly for free.
In the past, tech companies would send users to the original sources of this information. This formed the grand bargain of the web. Sites would let their data be taken for free on the understanding that they would get referrals in return, and could pay for their efforts through advertising, subscriptions, and other techniques.
In the new generative AI world, this deal is breaking down. Now, AI answer engines and chatbots give users direct answers, making people less likely to visit the websites that created the content in the first place.
Cloudflare, which helps run about 20% of the world's websites, has begun tracking this behavior. It measures Big Tech company bots' requests to crawl websites, and the number of referrals the platforms send to sites.
This crawl-to-refer ratio is a useful guide to how much tech companies are taking from the web and how much they're giving back. For example, a ratio of 100 to 1 would mean a company's bots crawled sites 100 times for every 1 referral they send.
Is this one way to measure how ethical companies are in the AI era? I'll leave you to decide. Here's the data for the first week in September.
As you can see, Anthropic stands out like a sore thumb. According to Cloudflare data, it crawls sites way more than it sends users out to the web.
This aligns with Business Insider reporting from about a year ago. Back then, we told you that bots from Anthropic and OpenAI, especially, were crawling some websites so much that it was causing their traffic costs to spike dramatically.
One web developer saw a client's cloud-computing costs double within a few months due to this AI bot swarm, according to BI reporting last year.
So, not only are AI companies taking from the web and giving less back — they are also leaving some site owners with bigger bills to pay.
I asked Anthropic why it crawls so much and gives so little back to the web. The startup said it couldn't confirm the crawl-to-refer ratios calculated by Cloudflare and said there may be "issues" with the methodology.
Anthropic also noted that it launched a web search feature for its popular Claude AI chatbot earlier this year. This is generating more referral traffic for websites now, and this is growing quickly, the startup said.
OpenAI did not respond to requests for comment. Perplexity responded with a detailed and thoughtful response that partly focused on the emerging ability of bots to represent human users' intentions, such as a desire to access knowledge on the web freely.
"In the case of public content, publishers can choose not to make their content public," Perplexity spokesperson Jesse Dwyer told Business Insider. "In the case of facts, copyright law, as you know, has always drawn a line between facts and expression. That's a foundation of human inquiry itself."
A caveat: The numbers that go into the crawl-to-refer ratio focus on the web and exclude native app activity. If app activity were included, the ratios might be lower. However, this methodology applies to all the companies included in this ranking.
Google's relatively low ratio is likely due to its traditional search engine, which still shows website links in many results. However, the company is increasingly weaving in AI chatbot-style answers into its search service, via AI Overviews and AI mode.
According to Cloudflare data, in the first week of January, Google's crawl-to-refer ratio was 3.3 to 1. That ratio jumped to 18 to 1 in the first week of April and then fell slightly to 9 to 1 in the first week of July.
Google says it still sends traffic to the web, and it cares about the health of this ecosystem.
Business Insider will track this Cloudflare data in the coming months and quarters to see how this behavior evolves.
Sign up for BI's Tech Memo newsletter here. Reach out to me via email at [email protected].