Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
Ahmad Al-Dahle, Vice President of Generative AI at Meta, today decided to compete with social network announce the release of Llama 3.3, the latest open source Multilingual Language Model (LLM) from the parent company of Facebook, Instagram, WhatsApp and Quest VR.
As he wrote: “Llama 3.3 improves base performance at a significantly lower cost, making it even more accessible to the entire open source community. »
With 70 billion parameters (or parameters governing model behavior), Llama 3.3 provides results comparable to those of Llama 3.1 Meta Settings Template 405B as early as the summer, but at a fraction of the cost and computational overhead – for example, the GPU capacity needed to run the model in an inference.
It's designed to deliver class-leading performance and affordability, in a smaller form factor than previous base models.
Meta's Llama 3.3 is offered under the Llama 3.3 Community License Agreementwhich grants a non-exclusive, royalty-free license to use, reproduce, distribute and modify the model and its output. Developers integrating Llama 3.3 into products or services must include appropriate attribution, such as “Built with Llama”, and adhere to an acceptable use policy that prohibits activities such as generating harmful content, violating laws or the activation of cyberattacks. Although licensing is generally free, organizations with more than 700 million monthly active users must obtain a commercial license directly from Meta.
A statement from the AI at Meta team highlights this vision: “Llama 3.3 delivers industry-leading performance and quality in text-based use cases, at a fraction of the cost of inference. »
How much savings are we talking about, really? Some back-of-the-envelope calculations:
Llama 3.1-405B requires between 243 GB and 1944 GB of GPU memory, depending on the Substrate Blog (For the open source cross-cloud substrate). Meanwhile, the older Llama 2-70B requires between 42 and 168 GB of GPU memory, depending on the same blogeven if it's the same thing claimed as low as 4 GBor like Exo Labs showed off some Mac computers equipped with M4 chips and no discrete GPUs.
Therefore, if the GPU savings for lower parameter models hold in this case, those looking to deploy Meta's most powerful open source Llama models can expect to save up to almost 1,940 GB of GPU memory, potentially reducing GPU load by 24 times. for a standard 80GB Nvidia H100 GPU.
To an estimate $25,000 per H100 GPUthat's potentially up to $600,000 in upfront GPU cost savings, not to mention ongoing power costs.
A high-performance model in a small format
According to Meta-AI on Xthe Llama 3.3 model greatly outperforms the Llama 3.1-70B of identical size as well as Amazon's new Nova Pro model in several benchmarks such as multilingual dialogue, reasoning, and other advanced natural language processing (NLP) tasks (Nova outperforms it in HumanEval coding tasks).
Llama 3.3 was pre-trained on 15 trillion tokens from “publicly available” data and fine-tuned on over 25 million synthetically generated examples, according to Meta information provided in the “model map” posted on its site Web.
Leveraging 39.3 million GPU hours on H100-80GB hardware, the model's development highlights Meta's commitment to energy efficiency and sustainability.
Llama 3.3 conducts multilingual reasoning tasks with an accuracy rate of 91.1% on MGSM, demonstrating its effectiveness in supporting languages such as German, French, Italian, Hindi, Portuguese , Spanish and Thai, in addition to English.
Economical and environmentally friendly
Llama 3.3 is specifically optimized for cost-effective inference, with token generation costs as low as $0.01 per million tokens.
This makes the model very competitive with industry peers like GPT-4 and Claude 3.5, with a more affordable price for developers looking to deploy sophisticated AI solutions.
Meta also highlighted the environmental responsibility of this release. Despite its intensive training process, the company leveraged renewable energy to offset greenhouse gas emissions, achieving net zero emissions for the training phase. Geolocated emissions stood at 11,390 tonnes of CO2 equivalent, but Meta's renewable energy initiatives ensured sustainability.
Advanced features and deployment options
The model introduces several improvements, including a longer popup of 128,000 tokens (comparable to GPT-4o, approximately 400 pages of book text), making it suitable for generating long-form content and other cases of advanced usage.
Its architecture incorporates Grouped Query Attention (GQA), improving scalability and performance during inference.
Designed to align with user preferences for security and utility, Llama 3.3 uses reinforcement learning with human feedback (RLHF) and supervised fine-tuning (SFT). This alignment ensures robust denials to inappropriate prompts and wizard-like behavior optimized for real-world applications.
Llama 3.3 is already available for download via Meta, Cuddly face, GitHuband other platforms, with integration options for researchers and developers. Meta also offers resources such as Llama Guard 3 and Prompt Guard to help users deploy the model safely and responsibly.
#Meta #releases #Llama #undercutting #powerful #405B #open #model
AI,AI, ML and Deep Learning,Conversational AI,LLaMA,Llama 3.1,llama 3.1-405b,llama 3.3,llama 405b,Meta,NLP,Open source ,