Alibaba releases new visual reasoning model that can see, understand, and think

Alibaba Qwen AI logo

Alibaba, the Chinese tech giant, has announced a new Qwen AI bot called QVQ-Max, a visual reasoning model. What makes this model interesting is that it can understand the content of photos and videos and then analyse and reason with that information to provide solutions.

With this model, Alibaba says that it’s bridging the gap between text-based AI models and real-world information. With visual reasoning, it claims that the model can see, understand, and think about things in the world. The Chinese firm said the model excels at parsing images and identifying key elements, and is flexible in what it can be used for including illustration design, video script generation, and role-playing.

Like other AI chatbots, QVQ-Max can help you with tasks at work, in education, or in your personal life, however, with its visual capabilities, it can also help with even more tasks in these areas such as math and physics problems accompanied with diagrams or guiding you through cooking a dish based on recipe images.

Alibaba called QVQ-Max just the first iteration of the model and has outlined how it plans to improve it in upcoming versions. First, it wants to improve image recognition accuracy through grounding techniques that validate observations. Secondly, it wants to make the model better at handling multi-step tasks and complex problems so that it can operate phones and computers and play games. Lastly, it plans to expand the model from just text-based interactions to include tool verification and visual generation.

To get started with QVQ-Max, just head over to chat.qwen.ai, the go to the model dropdown in the top left, press ‘Expand more models’ and select QVQ-Max. After this, go to chat box and get started, don’t forget to attach something visual to see what it can do.