Blog Post

Prmagazine > News > News > Cohere claims its new Aya Vision AI model is best-in-class | TechCrunch
Cohere claims its new Aya Vision AI model is best-in-class | TechCrunch

Cohere claims its new Aya Vision AI model is best-in-class | TechCrunch

Ask for AIthe nonprofit research lab of AI startup Cohere, released this week the multimodal “open” AI model Aya Vision, which claims to be top-notch.

Aya Vision can perform tasks such as writing image captions, answering questions about photos, translating text, and generating summary in 23 major languages. Cohere is also offering Aya Vision for free through WhatsApp, calling it a “important step in technological breakthroughs that enable researchers around the world to succeed.”

“Despite significant advances in AI, there is still a big gap in how models perform in different languages, which becomes more obvious in multimodal tasks involving text and images.” Blog Posts. “Aya Vision aims to help close this gap explicitly.”

Aya Vision comes in several flavors: Aya Vision 32B and Aya Vision 8b. Cohere said the more complex Aya Vision 32B sets a “new boundary” that exceeds 2 times the size, including Meta’s Llama-3.2 90B Vision On some visual understanding benchmarks. Meanwhile, according to Cohere, the AYA Vision 8b scores 10 times better on some evaluations than the model.

Both models are Available Embrace Faces 4.0 license from AI Dev Platform Appendix for Acceptable Use of COHERE. They cannot be used in commercial applications.

AYA vision is trained using “different pools” of English datasets that translate and use to create synthetic annotations, Cohere said. Comments, also known as labels or labels, help models understand and interpret data during training. For example, annotations for training an image recognition model may take the form of markers on the object or subtitles of each person, position or object described in the image.

cohere aya vision
Cohere’s Aya vision model can perform a series of visual comprehension tasks.Image source:common

Cohere’s use of synthetic annotations (i.e. comments generated by AI) is trending. Despite potential drawbacksCompetitors including OpenAI are increasingly using synthetic data to train models The data in the real world is well organized. Research firm Gartner estimate 60% of the data used for AI and Analytics projects last year was synthetic.

According to Cohere, training AYA’s vision for synthetic annotations allows labs to use less resources while achieving competitive performance.

“This shows how we are about efficiency and [doing] Less use less calculations. “This also provides greater support to the research community, where they often have more limited access to computing resources. ”

Cohere, together with Aya Vision, has released a new benchmark suite, AyavisionBench, aims to explore the model’s skills in “visual language” tasks, such as identifying differences between two images and converting screenshots into code.

The AI ​​industry is in the midst of what some call an “assessment crisis”, which is a common outcome of the benchmark Give a summary score with poor proficiency About tasks that most AI users care about. Cohere asserted that AyavisionBench is a step to correct this, providing a “broad and challenging” framework for evaluating the model’s transverbal and multimodal understanding.

Fortunately, that is indeed the case.

“[T]His dataset is a powerful benchmark for evaluating visual language models in multilingual and real worlds,” the researchers co-study. Write in a post Hug the face. “We provide this assessment set to the research community to drive multilingual multimodal assessment.”

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback