Blog Post

Prmagazine > News > News > Meta’s answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!
Meta’s answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!

Meta’s answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!


Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more


The entire AI landscape moved in January 2025 after the then little-known Chinese AI startup DeepSeek (Hong Kong-based quantitative analysis firm High-Flyer Capital Management). Launched a powerful open source language inference model DeepSeek R1 Open to the world, defeating American giants, such as meta.

As the use of DeepSeek spreads rapidly between researchers and businesses, The meta is reportedly sent to panic mode Learning that this new R1 model has received a small portion of the cost of many other leading models, but reportedly the cost of these models is only a few million dollars, which pays some of its own AI team leaders.

Meta’s entire generative AI strategy has also been based on the release of the best open source model of its brand name “Llama” for free for researchers and companies to build (at least, if they have fewer than 700 million users per month, then they should contact the special paid licensing conditions for special paid licensing terms). However, the DeepSeek R1 performed well on a smaller budget, allegedly shaking the company’s leadership and forcing some kind of estimate, the final version of Llama, 3.3released a month ago in December 2024, but is outdated.

Now we know the fruits of this effort: today, Meta Founder and CEO Mark Zuckerberg lands on his Instagram account Announce New Rama 4 series modeltwo of which – 40 billion parameters Llama 4 Maverick and 1009 billion parameters Llama 4 Scout- can be downloaded today by developers and started using or fine-tuning. llama.com and AI code sharing community Hug the face.

Today, we are also previewing the 200 trillion parameters Llama 4 behemoth. Although Meta’s blog posts in the release Saying it is still in training, there is no indication when it will be released. (Recall parameters refer to settings that control model behavior and are often more powerful and more complex around the model.)

One title feature of these models is that they are all multi-modal – trained so that they are able to receive and generate text, videos, and images (Hough Audio is not mentioned).

Another is that they have incredibly long context windows – 1 million tokens for tens of millions of calves and 10 million llama 4 scouts – equivalent to about 1,500 and 15,000 pages of text, all of which can be processed in a single input/output interaction. This means that users can theoretically upload or paste up to 7,500 pages of text and get so much reward from Llama 4 Scout, which is convenient for information such as medicine, science, engineering, engineering, mathematics, literature, and more, which is very convenient for information-intensive fields.

So far, we’ve learned more about this release:

Fully plug-in Experts

All three models use the “Expert (MOE) Mixture” architecture method Popularity from the early release of Openai models and MistralIt essentially combines multiple smaller models (“experts”) of different tasks, topics, and media formats into a unified holistic model. Therefore, each Llama 4 version is considered a mixture of 128 different experts and runs more efficiently because only the experts required for a specific task, coupled with the “shared” experts, can handle each token instead of the entire model having to run for each model.

As stated in the Llama 4 blog post:

As a result, although all parameters are stored in memory, only a subset of the total parameters is activated when using these models. This improves inference efficiency by reducing model service costs and latency – Lalla 4 Mavericks can run separately [Nvidia] The H100 DGX host can be easily deployed or has distributed inference for efficiency.

Both Scouts and Mavericks can provide self-hosting to the public, and neither the official metainfrastructure has announced a hosting API or pricing tier. Instead, Meta focuses on distribution through open downloads and integrated distribution with Meta AI in WhatsApp, Messenger, Instagram, and the web.

META estimates the inference cost of Llama 4 Maverick to be $0.19 to every 1 million tokens (using 3:1 of input and output). This makes it much cheaper than proprietary models like GPT-4O, which is estimated to be based on community benchmarks and is estimated at $438 per million tokens.

All three Llama 4 models (especially the Mavericks and Behemoths) are clearly designed for reasoning, coding and step-by-step problem solving, although they do not seem to show a chain of dedicated inference models such as the OpenAi “O” series, such as the OpenAi “O” series, nor does the DeepSeek R1.

Instead, they seem to be designed to compete more directly with “classic”, non-controversial LLM and multi-models such as OpenAI’s GPT-4O and DeepSeek’s V3 – besides the Llama 4 Bemoth, it Do Seems like a threat to DeepSeek R1 (more info below!)

Additionally, for Llama 4, Meta built a custom post-training pipeline with a focus on enhanced reasoning, such as:

  • Over 50% of the “simple” tips were removed during the fine tuning of supervision.
  • Adopt a continuous reinforcement learning cycle and gradually hard prompts.
  • Use PASS@K evaluation and course sampling to enhance math, logic, and coding performance.
  • Implementing Metap is a new technology that allows engineers to adjust hyperparameters (such as learning rate per layer) to models and apply them to other model sizes and token types while retaining expected model behavior.

METAP is particularly interesting because it can use it to set hyperparameters in the model and then get many other types of models from it, thus improving training efficiency.

As my VentureBeat colleague and LLM expert Ben Dickson put it, “This saves a lot of time and money. It means they experiment on smaller models, not on larger models.”

This is especially important when training models with behemoths with 32K GPUs and FP8 precision, achieving 390 TFLOPS/GPUs in over 30 trillion tokens, which is more than twice the Llama 3 training data.

In other words: researchers can tell the model extensively that they want it to take action and apply it to larger versions of the model, as well as across different forms of media.

A powerful one – but not yet this most Powerful – Model Family

In his Announcement video on Instagram (Naturally) Fox CEO Mark Zuckerberg said the company’s “image is to build world-leading AI, open source, and make it universally accessible to benefit everyone in the world…I’ve said for some time that I think open source AI will be the leading model and start happening with the Llama 4.”

It’s a clear and careful statement, Meta’s blog post called Llama 4 Scout, “the best multimodal model in the world In class And it’s more powerful than all previous generation camel models. ” (I emphasize).

In other words, these models are very powerful models, compared to other models in the parameter size class, near the top of the heap, but not necessarily setting a new performance record. Still, Meta is keen to touting its new Llama 4 family beat as: where:

Camel 4 Behemoth

  • Better than GPT-4.5, Gemini 2.0 Pro and Claude Sonnet 3.7 on:
    • Math-500 (95.0)
    • GPQA Diamond (73.7)
    • MMLU Pro (82.2)
Meta’s answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!

lapped 4 calfs

  • Beat GPT-4O and Gemini 2.0 Flash on most multimodal inference benchmarks:
    • ChartQA, DOCVQA, MATHVISTA, MMMU
  • Competition is used for DeepSeek v3.1 (45.8B parameters), while less than half of the active parameters (17b)
  • Benchmark score:
    • ChartQA: 90.0 (vs. 85.7 of GPT-4O)
    • DOCVQA: 94.4 (vs. 92.8)
    • MMLU Pro: 80.5
  • Cost-effectiveness: $0.19 – $0.49 per million tokens

Llama 4 scout

  • Mistral 3.1, Gemini 2.0 Flash-lite and Gemma 3, etc. match or win models:
    • DOCVQA: 94.4
    • MMLU Pro: 74.3
    • Mathvista: 70.7
  • Unrivaled Billion Token Context Length – Ideal for Long Documents, Codebases or Multi-Transformation Analysis
  • Designed for efficient deployment on a single H100 GPU

But after all, how does the Llama 4 stack onto DeepSeek?

But of course, there are other heavy-duty inference models like DeepSeek R1, Openai’s “O” series (like GPT-4O), Gemini 2.0 and Claude Sonnet.

Use a model based on the highest parameter model – Llama 4 Beymoth – and compare it to Intial DeepSeek R1 release chart For the R1-32B and Openai O1 models, here is how the Llama 4 behemoth stacks:

BenchmarkCamel 4 BehemothDeepSeek R1Openai O1-1217
Math-50095.097.396.4
GPQA Diamond73.771.575.7
mmlu82.290.891.8

What conclusions can we draw?

  • Math-500: Llama 4 Behemoth a little In the back DeepSeek R1 and Openai O1.
  • GPQA Diamond: Behemoth is Before DeepSeek r1, but behind Openai O1.
  • MMLU: Behemoths dragged both, but still outperformed the Gemini 2.0 Pro and GPT-4.5.

Important: While the DeepSeek R1 and Openai O1 eliminated behemoth on several metrics, the Llama 4 Behemoth is still highly competitive and performs at or near the top of the class reasoning rankings.

Security and political “bias”

Meta also helps developers detect unsafe input/output or adversarial cues by introducing tools such as Llama Guard, Proft Guard, and Cyberseceval and implements automatic red team’s Generative Attack Agent Test (Goat).

The company also claimed that Llama 4 showed a great improvement in “political bias” and said: “Specifically, [leading LLMs] Historically, left-wing left-wing left-wing when debating political and social topics. Zuckerberg’s embrace of Republican U.S. President Donald J. Trump and his party after the 2024 election.

So far, the location of Llama 4

Meta’s Llama 4 model brings together efficiency, openness, and high-end performance in multimodal and inference tasks.

Now that Scouts and Mavericks are now publicly available, the Behemoth is previewed as state-of-the-art teacher models, the Llama ecosystem will be positioned as top proprietary models from OpenAI, Anthropic, Deepseek and Google to offer competitive open alternatives.

Whether you are a construction enterprise-scale assistant, AI research pipeline or a long-form cultural analysis tool, Llama 4 offers flexible, high-performance options with clear directions to prioritize design for reasoning.


Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback