Meta released a new collection of AI models, Llama4, as part of its Llama family on Saturday.
The three new models are:
- Llama 4 Scout
- Llama 4 Maverick
- Llama 4 Behemoth
Meta says they were all trained on massive amounts of unlabeled text, image, and video data to give them a broad visual understanding.
Models from the Chinese AI lab DeepSeek matched or outperformed Meta’s earlier Llama models, reportedly prompting Meta to accelerate Llama development. Meta set up teams to study how DeepSeek reduced costs for running and deploying models like R1 and V3.
Scout and Maverick are available on llama.com and through Meta’s partners, such as the AI development platform Hugging Face. Behemoth is still being trained. Meta says its AI assistant, Meta AI, now uses Llama4 in 40 countries. For now, multi-modal features are only available in English in the US across apps like WhatsApp, Messenger, and Instagram.
Developers might have concerns about the Llama4 license. Users and companies domiciled in or with a principal place of business in the EU are prohibited from using or distributing the models, likely due to governance requirements imposed by the region’s AI and data privacy laws. Meta has declared these laws to be overly burdensome. In addition, as with previous Llama releases, companies with more than 700 million monthly active users must request a special license from Meta, which Meta can grant or deny at its sole discretion.
These Llama4 models mark the beginning of a new era for the Llama ecosystem. Meta wrote in a blog post: “This is just the beginning for the Llama 4 collection.”
Meta says that Llama 4 is its first cohort of models to use a mixture-of-experts (MOE) architecture, which is more computationally efficient for both training and query answering. MOE architecture breaks down data-computation tasks into subtasks and delegates them to smaller, specialized expert models.
Maverick, for example, has 400 billion total parameters but only 17 billion active parameters spread across 128 experts. Parameters are a rough measure of a model’s problem-solving ability. According to SCOUT, it has 17 billion active parameters, 16 experts, and 109 billion total parameters.
According to Meta’s internal testing memory, which the company says is best for general assistant and chat use cases like creative writing, it exceeds models such as OpenAI’s GPT-4.0 and Google’s Gemini 2.0 on certain coding, reasoning, multilingual, long context, and image benchmarks. However, Maverick doesn’t quite measure up to more capable recent models, such as Google’s Gemini 2.5 Pro, Anthropic’s Claude 3.7 Sonnet, and OpenAI’s GPT-4.5.
Scout excels at summarizing documents and assessing large codebases thanks to its 10-million-token context window. Tokens are pieces of raw text, like splitting ‘fantastic’ into ‘fan’, ‘tas’, and ‘tic’. This means Scout can process images and millions of words at once, making it useful for very long documents.
Scout can run on a single NVIDIA H100 GPU, according to Meta. Maverick requires an NVIDIA H100 DGX system or a similar configuration. Behemoth, the most demanding, has 288 billion active parameters, 16 experts, and nearly 2 trillion total parameters, necessitating even more powerful hardware. Meta’s internal benchmarking shows Behemoth as top-performing: GPT 4.5, Cloud Sonnet 3.7, and Gemini 2.0, but not 2.5 Pro, across several STEM skill evaluations, such as math problem-solving.
None of the Llama4 models are true reasoning models like OpenAI’s O1 and O3 Mini, which check their answers and usually give more reliable responses but take longer to reply than non-reasoning models.
Interestingly, Meta says it tuned all its Llama 4 models to answer contentious questions less often. According to the company, Llama 4 responds to debated political and social topics that the previous crop of Llama models wouldn’t. In addition, the company says Llama 4 is dramatically more balanced, which prompts it to flat-out refuse to entertain.
Count on Llama4 to provide helpful, factual answers without judgment, a Meta spokesperson told TechCrunch. We’re continuing to make Llama more responsive so that it answers more questions, responds to a variety of viewpoints, and doesn’t favor some views over others. These changes come at a time when some vital allies are accusing AI chatbots of being unduly politically woke.
Many of President Donald Trump’s close confidants, including billionaire Elon Musk and crypto and AI czar David Sachs, have alleged that popular AI chatbots censor conservative views. Sachs has historically singled out OpenAI’s ChatGPT as programmed, but it’s still not clear whether it will continue to be vocal and as
Bias in AI remains a difficult technical problem, and even Musk’s AI company XAI has struggled to build a chatbot that does not favor certain political views.
OpenAI and others have adjusted AI models, so they answer more questions, especially controversial ones.
Source: Meta releases Llama 4, a new crop of flagship AI models










