2026-05-07 By Jonah Quinn 5 min read
Z.ai’s new ImageMining benchmark asks multimodal agents to inspect images, crop details, search outward, and reason across sources. That is a better test for many real visual workflows than another captioning score.
2026-04-28 By Nico Sable 5 min read
NVIDIA’s new open multimodal model is pitched as a cheaper perception layer for agents that need to read screens, documents, video, and audio without stitching four models together.
2026-04-07 By Nico Sable 3 min read
The launch of Llama 4 Maverick and Scout is thrilling for the open ecosystem, promising MoE scale and multimodality. Now builders need to stop clapping and start testing hardware reality.
2026-03-26 By Jonah Quinn 4 min read
Gemini Robotics and Gemini Robotics-ER bring multimodal reasoning to robots. The lesson isn't that a robot butler is arriving tomorrow, but that embodied AI leaves no room for demo theater.
2026-03-12 By Nico Sable 3 min read
Mistral Small 3.1 proves that the most important open models aren't the largest ones, but the ones you can actually afford to deploy locally.