Meta exec denies the company artificially boosted Llama 4's benchmark scores

A Meta executive denied rumours on Monday that the company trained its new AI model to show well on a specific benchmark while hiding the model’s weaknesses.

Executive Ahmad al-Dahle, Vice President of Generative AI at Meta Say in a post on X Meta trained it “simply incorrect” Llama 4 Maverick and Llama 4 Scout models On the Test Set. In AI benchmarks, the test set is a data set used to evaluate performance after model training. Training on test groups can mislead the model’s benchmark score, making the model appear to be stronger than it actually is.

On weekends, An unconfirmed rumor This metadata artificially enhances the benchmark results of its new model, starting to loop on X and Reddit. The rumor appears to have arisen from a post on a Chinese social media website that claims to have resigned from Meta in protest of the company’s benchmarking practices.

Report on Mavericks and Scouts fulfill Difference exist Certain tasks Rumors have intensified, Meta decides to use Experimental, unreleased Mavericks version Get better scores on the benchmark LM Arena. Researchers about X Observed Stark Behavioral differences The Mavericks are publicly downloaded compared to the model held on LM Arena.

Al-Dahle acknowledged that some users have seen “hybrid quality” of Maverick and Scout among different cloud providers hosting these models.

“Since we abandoned these models once we were ready, we expected all public implementations to take several days,” Al-Dahle said. “We will continue to work with bug fixes and onboarding partners.”

Source link

Meta exec denies the company artificially boosted Llama 4’s benchmark scores | TechCrunch

You Don't Need to Know Anything About Minecraft to Love the New Jack Black Movie

Jackbox is back with new party games, including one based on sound effects

Leave a comment Cancel reply

Categories

Contact us

Hand Picked News

‘Facts of Life’ actress sets record straight about reboot after

Google is allegedly paying some AI staff to do nothing

Framework pauses some US laptop sales due to tariffs

Blog Post

You Don&apos;t Need to Know Anything About Minecraft to Love the New Jack Black Movie