Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
Researchers at AI Section have developed a resource-efficient framework that can create hundreds of language models specialized for different tasks. Called CycleQDThe technique uses scalable algorithms to combine skills from different models without the need for costly and slow training processes.
CycleQD can create task-specific agent swarms that provide a more sustainable alternative to the current paradigm of increasing model sizes.
Rethinking model training
Large language models (LLM) showed remarkable abilities in various tasks. However, training LLMs to master multiple skills remains a challenge. When refining models, engineers must balance data from different skills and ensure that one skill does not dominate the others. Current approaches often involve training larger and larger models, leading to increased computational and resource demands.
“We believe that instead of aiming to develop a single large model capable of efficiently performing all tasks, population-based approaches to evolving a diverse swarm of niche models could offer an alternative and more sustainable path to scaling up the development of AI agents with advanced capabilities. », Write the Sakana researchers in a blog article.
To create model populationsThe researchers drew inspiration from quality diversity (QD), an evolutionary computing paradigm that focuses on discovering a diverse set of solutions from an initial population sample. QD aims to create specimens with various “behavioral characteristics” (BC), which represent different skill areas. It achieves this through evolutionary algorithms (EAs) that select parent samples and use crossover and mutation operations to create new samples.
CycleQD
CycleQD integrates QD into the post-training pipeline of LLMs to help them acquire complex new skills. CycleQD is useful when you have several small models that have been fine-tuned for very specific skills, such as coding or execution. database and operating system operations and you want to create new variants with different combinations of these skills.
In the CycleQD framework, each of these skills is considered a behavioral characteristic or quality for which the next generation of models is optimized. In each generation, the algorithm focuses on a specific skill as a quality measure while using the other skills as BC.
“This ensures that each skill is emphasized, allowing LLMs to become more well-rounded and more successful overall,” the researchers explain.
CycleQD begins with a set of expert LLMs, each specializing in a single skill. The algorithm then applies “crossover” and “mutation” operations to add new, better models to the population. Crossover combines features from two parent models to create a new model while mutation makes random changes to the model to explore new possibilities.
The crossover operation is based on merging modelsa technique that combines parameters from two LLMs to create a new model with combined skills. This is a cost-effective and time-saving method for developing complete models without the need for refinement.
The mutation operation uses decomposition into singular values (SVD), a factorization method that decomposes any matrix into simpler components, making its elements easier to understand and manipulate. CycleQD uses SVD to break down the model's skills into core components or sub-skills. By refining these subskills, the mutation process creates models that explore new abilities beyond those of their parent models. This helps models avoid getting stuck in predictable patterns and reduces the risk of overfitting.
CycleQD performance evaluation
The researchers applied CycleQD to a set of Call 3-8B refined expert models for coding, database operations, and operating system operations. The goal was to see if the evolutionary method could combine the skills of all three models to create a superior model.
The results showed that CycleQD outperformed traditional fine-tuning and model fusion methods in the evaluated tasks. Notably, a model fine-tuned on all datasets combined performed only slightly better than single-skill expert models, despite being trained on more data. Additionally, the traditional training process is much slower and more expensive. CycleQD was also able to create different models with different levels of performance on the target tasks.
“These results clearly show that CycleQD outperforms traditional methods, proving its effectiveness in training LLMs to excel in multiple skills,” the researchers write.
Researchers believe CycleQD has the potential to enable lifelong learning in AI systems, allowing them to continually grow, adapt, and accumulate knowledge over time. This can have direct implications for real-world applications. For example, CycleQD can be used to continually merge the skills of expert models instead of training a large model from scratch.
Another exciting direction is the development of multi-agent systems, in which swarms of specialized agents evolved through CycleQD can collaborate, compete and learn from each other.
“From scientific discovery to solving real-world problems, swarms of specialized agents could redefine the limits of AI,” the researchers write.
#Sakana #AI39s #CycleQD #outperforms #traditional #finetuning #methods #multiskill #language #models
AI,Business,AI research,AI, ML and Deep Learning,category-/Science,Conversational AI,Fine-tuning large language models,large language models,LLMs,model fine-tuning,NLP,research,sakana ai ,