Ai2's Molmo Outperforms Closed Multimodal Models in Open Source Demonstration

The common belief is that only companies like Google, OpenAI, and Anthropic, with unlimited financial resources and top researchers, have the ability to create cutting-edge foundation models. However, as one of these companies has famously stated, they “have no moat” – and Ai2 has proven this with the release of Molmo, a multimodal AI model that rivals the best in the industry while also being small, free, and truly open source. To clarify, Molmo (multimodal open language model) is a visual understanding engine, not a full-service chatbot like ChatGPT. It does not have an API, is not suitable for enterprise integration, and does not search the web for its own purposes. Think of it as the part of these models that can see an image, comprehend it, and provide descriptions or answer questions about it. Molmo comes in 72B, 7B, and 1B-parameter variants and, like other multimodal models, is capable of identifying and answering questions about almost any everyday situation or object. For example, it can answer questions like “How do you work this coffee maker?” or “How many dogs in this picture have their tongues out?” or “Which options on this menu are vegan?” or “What are the variables in this diagram?” These are tasks that we have seen demonstrated with varying levels of success and latency over the years. What sets Molmo apart is not necessarily its capabilities (which you can see in the demo below or test here), but how it achieves them. Visual understanding is a broad domain, ranging from counting sheep in a field to guessing a person’s emotional state to summarizing a menu. As such, it is difficult to describe and test quantitatively. However, as Ai2 CEO Ali Farhadi explained at a demo event at the organization‘s headquarters in Seattle, it is possible to show that two models are similar in their capabilities. “One thing that we are demonstrating today is that open is equal to closed,” he said. “And small is now equal to big.” (He clarified that he meant “equivalency” rather than “identity,” which is an important distinction for some.) In the world of AI development, the mantra has always been “bigger is better.” More training data, more parameters in the resulting model, and more computing power to create and operate them. However, there comes a point where it is simply not feasible to make them any bigger – either there is not enough data, or the costs and time required for computing become too high. In these cases, it is necessary to make do with what you have, or even better, do more with less. Farhadi explained that Molmo, while performing on par with models like GPT-4o, Gemini 1.5 Pro, and Claude-3.5 Sonnet, is only about one-tenth of their size (according to best estimates). And it achieves this level of capability with a model that is only one-tenth of that size. “There are many different benchmarks that people use to evaluate models. I don’t like this approach scientifically, but I had to provide a number for people,” he said. “Our largest model, the 72B, is actually a small model, and it outperforms many larger models.”

Novi Sad marks five months since the railway station tragedy

Putin starts largest Russian military call up in years

Lithuanian FM on Europe’s role in ending Russia’s war

Salome Zurabishvili says that it is clear Bidzina makes all the decisions. Everyone speaks with the same “message box” and there is not a single different statement. The ‘Georgian Dream has lost all those who were pro-European or pro-Western

Giorgi Vashadze : The regime tries to punish anyone who does not accept their lies and injustices, but it won’t succeed – complete solidarity with Mamuka Khazaradze

PM Shmyhal: EU transfers $3.7 billion in Ukraine under Ukraine Facility program

Zurab Japaridze (ZJ): I support any effort to unite forces. However, I believe that the most urgent and important task is to establish an unified and clear position about how to defeat the régime, which we will then explain to the citizens, and then implement with them.

Salome Zurabishvili : Georgia follows the same playbook of Russia and Belarus

Finland to withdraw from Ottawa landmine convention and aim for 3% GDP defence spending by 2029

Top Stories

Novi Sad marks five months since the railway station tragedy

Putin starts largest Russian military call up in years

Lithuanian FM on Europe’s role in ending Russia’s war

Stay Connected

Ai2’s Molmo Outperforms Closed Multimodal Models in Open Source Demonstration

Leave a Reply Cancel reply

Related Stories

Caitlin wins AP Female Athlete of the year

t`ibisis OPEN API-s gundma Google-is mocvevit` parizshi gamart`ul samitshi miig’o monacileoba

Winners of the “Build with AI Hackathon”: Startups from Central Asia, the Caucasus and South Asia tackle environmental challenges using AI

“Last Chance to Save $600 on TechCrunch Disrupt 2024 Tickets – Only 2 Days Left!”

Meta’s Llama AI now supports images.

Official: Google Maps reveals locations of Ukrainian weapon systems

Salesforce acquires Zoomin, a company knowledge organization tool.

Spotify’s AI playlist now available in US and more markets.

Quick Links

About US