Simon Willison's Weblog

Subscribe

Vision Language Models (Better, Faster, Stronger) (via) Extremely useful review of the last year in vision and multi-modal LLMs.

So much has happened! I'm particularly excited about the range of small open weight vision models that are now available. Models like gemma3-4b-it and Qwen2.5-VL-3B-Instruct produce very impressive results and run happily on mid-range consumer hardware.

Posted 13th May 2025 at 3:25 pm

Recent articles

ai 1806 generative-ai 1598 local-llms 145 llms 1564 hugging-face 19 vision-llms 82

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe