June 14, 2024

Reddit has long been a hot topic on the internet. About 57 million people visit the site every day to discuss a variety of topics, including makeup, video games and guidelines for electric driveway washing.

In recent years, a series of Reddit chats have also become free teaching aids for companies such as Google, OpenAI and Microsoft. These companies are using Reddit conversations to develop giant artificial intelligence systems that many in Silicon Valley believe are becoming the next big thing in tech.

Now Reddit wants to pay for it. The company said Tuesday that it plans to start charging companies for access to its application programming interface (API), which allows outside entities to download and process the social network’s vast number of person-to-person conversations.

“Reddit’s corpus of data is incredibly valuable,” Reddit founder and CEO Steve Hoffman said in an interview. “But we don’t need to give all of that value to some of the biggest companies in the world for free.”

The move is one of the first significant examples of a social network charging for the conversations it hosts to develop AI systems such as OpenAI’s popular program ChatGPT. These new AI systems may one day bring big business, but they’re unlikely to help a company like Reddit much. In fact, they can be used to create rivals—automatically replicating Reddit conversations.

Reddit is also preparing for a possible initial public offering on Wall Street this year. Founded in 2005, the company generates most of its revenue from advertising and e-commerce transactions on its platform. Reddit said it is still finalizing the details of what it will charge for API access and will announce prices in the coming weeks.

Reddit’s conversational forums have become a valuable commodity, as large language models, or LLMs, have become an essential part of creating new AI techniques.

LLMs are essentially complex algorithms developed by companies like Google and OpenAI, which is a close partner of Microsoft. Conversations on Reddit are data to the algorithms, and they are one of the vast amounts of material fed into the LLM to develop them.

The underlying algorithm that helped build Bard, Google’s conversational AI service, is part training About Reddit data. OpenAI’s chat GPT quote Reddit data as one of source The information it was trained on.

Other companies are also starting to see the value in the conversations and images they host. Shutterstock, an image hosting service, also Sell ​​image data to OpenAI Help create DALL-E, an AI program that can create vivid graphic images, requiring only text-based prompts.

Last month, Twitter owner Elon Musk said he was cracking down on use of Twitter’s API, which is used by thousands of companies and independent developers to track millions of conversations across the web.Although he didn’t cite the LL.M. as a reason for the change, the new fee could serve well tens of thousands or even hundreds of thousands of dollars.

To continuously improve their models, AI makers need two important things: lots of computing power and lots of data. Some of the biggest AI developers have massive computing power but still look outside their networks for the data they need to improve their algorithms. This includes resources like Wikipedia, millions of digitized books, scholarly articles, and Reddit.

Representatives for Google, Open AI and Microsoft did not immediately respond to requests for comment.

Reddit has long had a symbiotic relationship with search engines from companies like Google and Microsoft. Search engines “crawl” Reddit’s pages to index the information and make it available in search results. that crawling, or “scratching,” not always popular through every site on the internet. But Reddit benefits from ranking high in search results.

Dynamics are different than LLMs – they gobble up as much data as they can to create new AI systems, like chatbots.

Reddit considers its data particularly valuable because it is constantly updated. Mr Hoffman said this kind of novelty and relevance is what is needed for large language modeling algorithms to produce the best results.

“Reddit is a better place for real conversations than anywhere else on the internet,” Mr. Hoffman said. “There’s a lot of stuff on the site that you’d only be treating, AA, or not saying at all.”

Mr. Hoffman said Reddit’s API will remain free for developers who want to build apps to help people use Reddit. For example, they could use these tools to build a bot that automatically tracks whether user comments comply with posting rules. Researchers who want to study Reddit data for academic or non-commercial purposes will continue to have free access to it.

Reddit also wants to incorporate more so-called machine learning into how the site itself works. For example, it could be used to identify the use of AI-generated text on Reddit and add a label to notify users that comments are from bots.

The company also pledged to improve the software tools available to moderators — who volunteer their time to keep the site’s forums running smoothly and improve conversations between users. Third-party bots that help moderators monitor forums will continue to be supported.

But for AI makers, it’s time to pay the price.

“Crawling Reddit, creating value without returning any value to our users is the problem we have,” Mr. Hoffman said. “It’s a good time for us to tighten things up.”

“We think it’s fair,” he added.

