Original version of This story Appear in Quanta Magazine.
Large language models work well because they are huge. The latest models from OpenAI, Meta and DeepSeek use hundreds of billions of “parameters”, an adjustable knob that determines the connection between data and adjusts during training. With more parameters, the model can better identify patterns and connections, making them more powerful and accurate.
But this power comes at a price. Training models with tens of billions of parameters requires a lot of computing resources. For example, to train its Gemini 1.0 Ultra model, for example, Google spent it on $191 million. Large Language Models (LLMS) also require considerable computing power each time they answer a request, making them infamous energy pigs. A query Consumed about 10 times According to the Power Institute, as many as a single Google search.
In response, some researchers now consider it small. IBM, Google, Microsoft, and OpenAI have all recently released Small Language Models (SLMs) using billions of parameters, a small part of their LLM.
Small models are not used as a general tool such as large cousins. But they can do well on specific, narrower tasks, such as summarizing conversations, answering patients’ questions as healthcare chatbots, and collecting data in smart devices. “For many tasks, an 8 billion parameter model is actually good,” Zico Koltera computer scientist at Carnegie Mellon University. They can also run on laptops or phones instead of huge data centers. (There is no consensus on the exact definition of “small”, but the maximum maximum of the new model is about 10 billion parameters.)
To optimize the training process for these small models, the researchers used some tips. Large models often scratch raw training data from the Internet, and this data can be messy, messy, and difficult to handle. But these large models can then generate high-quality datasets that can be used to train small models. This approach, known as knowledge distillation, allows larger models to pass training effectively, just as teachers provide lessons to students. “reason [SLMs] Being so good in such a small model, and with very little data, is that they use high-quality data instead of messy things. ” Cort said.
The researchers also explored ways to create small models by starting from large ones and pruning them. A method called pruning requires eliminating unnecessary or inefficient parts Neural Network– Large model foundations of a huge network connecting data points.
The inspiration for pruning is from real-life neural networks, the human brain, which gains efficiency by depriving the connection between synapses. Today’s trimming method is traced back to Papers from 1989 Yann Lecun, a computer scientist now at Meta, believes that 90% of the parameters in a trained neural network can be removed without sacrificing efficiency. He called the method “the best brain injury.” Pruning can help researchers fine-tune small language models for specific tasks or environments.
For researchers who have done their jobs with language models, smaller models offer a cheap way to test new ideas. And because they have fewer parameters than large models, their reasoning may be more transparent. “If you want to make a new model, you need to give it a try.” Leshem Choshena research scientist at the MIT-IBM Watson AI Laboratory. “The small model allows researchers to try lower bets.”
Large expensive models and their increasing parameters will be used for example generalized chatbots, image generators and Drug Discovery. But for many users, a small target model will be equally feasible, while making it easier to train and build researchers. “These efficient models can save money, time and calculations,” Choshen said.
ability Reprinted with permission Quanta Magazine,,,,, Edit independent publications Simmons Foundation Its mission is to enhance public understanding of science by covering research developments and trends in mathematics and physics and life sciences.