[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
-
Updated
Aug 8, 2025 - Python
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Command-line tool for extracting DINOv3, CLIP, SigLIP2, RADIO, features for images and videos
deepfake-detector-model-v1 is a vision-language encoder model fine-tuned from siglip2-base-patch16-512 for binary deepfake image classification. It is trained to detect whether an image is real or generated using synthetic media techniques. The model uses the SiglipForImageClassification architecture.
Local image search engine powered by AI
Facial-Emotion-Detection-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
nsfw-image-detection is a vision-language encoder model fine-tuned from siglip2-base-patch16-256 for multi-class image classification. Built on the SiglipForImageClassification architecture, the model is trained to identify and categorize content types in images, especially for explicit, suggestive, or safe media filtering.
Find any moment in your videos using Voice & Visuals. A full-stack AI search engine.
Contrastive Olfaction-Language-Image Pre-training Model. The first-ever series of embeddings models for olfaction-vision-language applications in robotics and embodied AI - an extension of CLIP with olfaction.
This repository offers tools and guidance for fine-tuning the Siglip2 Vision Transformer (ViT) model. It includes scripts and best practices to adapt the model for custom datasets and tasks. Designed for researchers and developers, it ensures efficient fine-tuning and optimal performance for vision-based applicatio
This is an extended version of the paper "TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection," accepted at ICASSP 2026 and scheduled to be publicly available in May. Author contributions may vary between versions.
Watermark-Detection-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to detect whether an image contains a watermark or not, using the SiglipForImageClassification architecture.
nsfw-image-detection is a vision-language encoder model fine-tuned from siglip2-base-patch16-256 for multi-class image classification. Built on the SiglipForImageClassification architecture, the model is trained to identify and categorize content types in images, especially for explicit, suggestive, or safe media filtering.
Human-Action-Recognition is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class human action recognition. It uses the SiglipForImageClassification architecture to predict human activities from still images.
Age-Classification-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to predict the age group of a person from an image using the SiglipForImageClassification architecture.
siglip2-mini-explicit-content is an image classification vision-language encoder model fine-tuned from siglip2-base-patch16-512 for a single-label classification task. It is designed to classify images into categories related to explicit, sensual, or safe-for-work content using the SiglipForImageClassification architecture.
SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
Augmented-Waste-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
Fire-Detection-Siglip2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to detect fire, smoke, or normal conditions using the SiglipForImageClassification architecture.
Coral-Health is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify coral reef images into two health conditions using the SiglipForImageClassification architecture.
Add a description, image, and links to the siglip2 topic page so that developers can more easily learn about it.
To associate your repository with the siglip2 topic, visit your repo's landing page and select "manage topics."