sourcegraph
September 28, 2023

When it comes to AI chatbots, usually bigger is better.

Large language models such as ChatGPT and Bard generate conversational raw text that improves when fed more data. Every day, bloggers take to the internet to explain how the latest developments—an app that summarizes articles, an AI-generated podcast, a fine-tuned model that can answer any question related to pro basketball—will “change everything.”

But building bigger, more powerful artificial intelligence requires processing power that few companies have, and fears are growing that a handful of people, including Google, Meta, OpenAI and Microsoft, will take near-total control of the technology. control.

Also, larger language models are harder to understand. They’re often described as “black boxes,” even by the people who designed them, and leading figures in the field have expressed disquiet that AI’s goals may ultimately not align with our own. If bigger is better, it’s also more opaque and unique.

In January, a group of young academics working in natural language processing (a branch of artificial intelligence that focuses on language understanding) launched a challenge to upend that paradigm. The group is calling for teams to create functional language models using datasets that are one-tenth smaller than those used by state-of-the-art large language models. A successful miniature model will function almost as well as a high-end model, but will be smaller, more accessible and more compatible with humans. The project is called the BabyLM Challenge.

“We’re challenging people to think small and focus more on building efficient systems that more people can use,” said Aaron Mueller, a computer scientist at Johns Hopkins University and organizer of BabyLM.

Alex Warstadt, a computer scientist at ETH Zurich and another organizer of the project, added: “This challenge asks questions about language learning in humans, rather than ‘our models How big can it go?’” was at the center of the conversation. “

Large language models are neural networks designed to predict the next word in a given sentence or phrase. They were trained for this task using corpora collected from transcripts, websites, novels and newspapers. Typical models make guesses based on example phrases, then adjust themselves based on how close they are to the correct answer.

By repeating this process over and over, a model forms a map of how words relate to each other. In general, the more words a model is trained on, the better it gets; each phrase gives the model context, and more context translates into a more detailed impression of what each word means. OpenAI’s GPT-3, released in 2020, was trained on 200 billion words; DeepMind’s Chinchilla, released in 2022, was trained on trillions.

For ETH Zurich linguist Ethan Wilcox, the fact that something nonhuman can generate language presents an exciting opportunity: Can AI language models be used to study how humans learn language?

For example, nativism, an influential theory dating back to the early writings of Noam Chomsky, claims that humans are able to learn language quickly and efficiently because they have an intimate knowledge of how language works. Innate understanding. But language models also learn languages ​​quickly, and don’t seem to have an innate understanding of how languages ​​work — so nativism may not hold water.

The challenge is that language models learn very differently than humans. Humans have bodies, social lives, and abundant senses. We can smell the mulch, feel the blades of the feathers, bump against the door, and taste the mint. Early on, we are exposed to simple spoken words and syntax that would not normally be expressed in written form. As a result, Dr. Wilcox concluded that a computer that generates language after being trained on large amounts of written text can only tell us so much about our own language processes.

But if a language model is only exposed to words that young people encounter, it might be able to interact with language in a way that resolves some of the doubts we have about our own abilities.

So Drs Wilcox, Drs Mueller and Drs Warstadt, along with six colleagues, conceived the BabyLM Challenge as an attempt to bring language models closer to human understanding. In January, they issued a call asking the team to train a language model on the same number of words (about 100 million) that a 13-year-old human encountered. Candidate models will be tested on their ability to generate and recognize linguistic nuances, and a winner will be announced.

Eva Portelance, a linguist at McGill University, faced challenges the day she announced it. Her research straddles the often blurred lines between computer science and linguistics. In the 1950s, the motivation for the first forays into artificial intelligence was the desire to simulate human cognitive abilities in computers. The basic unit of information processing in AI is the “neuron,” and early language models in the 80s and 90s were directly inspired by the human brain. ‌‌‌

But as processors became more powerful, and companies began to develop marketable products, computer scientists realized that it was often easier to train language models on large amounts of data than to force them into mental information structures. As a result, Dr. Portelance said, “They gave us human-like texts, but we didn’t have any connection to how they functioned.”‌

For scientists interested in understanding how the human mind works, these large models offer limited insight. And because they require enormous processing power, few researchers have access to them. “Only a handful of industry labs with significant resources have the capability to train models with billions of parameters on trillions of words,” said Dr. Wilcox said.

“Even load them up,” adds Dr. Mueller. “It makes research in the field feel less democratic these days.”

Dr Portelance said the BabyLM Challenge could be seen as an arms race away from larger language models and a step towards more accessible, intuitive AI

The potential of this type of research project has not been overlooked by larger industrial laboratories. OpenAI CEO Sam Altman, recently said Increasing the size of the language model does not lead to the kind of improvements seen in the past few years. Companies like Google and Meta have also been investing in research into more efficient language models, informed by the architecture of human cognition. After all, models that can generate language when trained on less data also have the potential to scale.

Whatever profits a successful BabyLM might bring, for those behind the challenge, the goals were more academic and abstract. Even prizes subvert practicality. “Just pride,” Dr. Wilcox said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *