Blog Post

Prmagazine > News > News > Meta exec denies the company artificially boosted Llama 4’s benchmark scores | TechCrunch
Meta exec denies the company artificially boosted Llama 4’s benchmark scores | TechCrunch

Meta exec denies the company artificially boosted Llama 4’s benchmark scores | TechCrunch

A Meta executive denied rumours on Monday that the company trained its new AI model to show well on a specific benchmark while hiding the model’s weaknesses.

Executive Ahmad al-Dahle, Vice President of Generative AI at Meta Say in a post on X Meta trained it “simply incorrect” Llama 4 Maverick and Llama 4 Scout models On the Test Set. In AI benchmarks, the test set is a data set used to evaluate performance after model training. Training on test groups can mislead the model’s benchmark score, making the model appear to be stronger than it actually is.

On weekends, An unconfirmed rumor This metadata artificially enhances the benchmark results of its new model, starting to loop on X and Reddit. The rumor appears to have arisen from a post on a Chinese social media website that claims to have resigned from Meta in protest of the company’s benchmarking practices.

Report on Mavericks and Scouts fulfill Difference exist Certain tasks Rumors have intensified, Meta decides to use Experimental, unreleased Mavericks version Get better scores on the benchmark LM Arena. Researchers about X Observed Stark Behavioral differences The Mavericks are publicly downloaded compared to the model held on LM Arena.

Al-Dahle acknowledged that some users have seen “hybrid quality” of Maverick and Scout among different cloud providers hosting these models.

“Since we abandoned these models once we were ready, we expected all public implementations to take several days,” Al-Dahle said. “We will continue to work with bug fixes and onboarding partners.”

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback