Open WebUI Tools Collection
A modular collection of tools, function pipes, and filters to supercharge your Open WebUI experience.
Transform your Open WebUI instance into a powerful AI workstation with this comprehensive toolkit. From academic research and image generation to music creation and autonomous agents, this collection provides everything you need to extend your AI capabilities.
What's Inside
This repository contains 20+ specialized tools and functions designed to enhance your Open WebUI experience:
Tools
- arXiv Search - Academic paper discovery (no API key required!)
- Perplexica Search - Web search using Perplexica API with citations
- Pexels Media Search - High-quality photos and videos from Pexels API
- YouTube Search & Embed - Search YouTube and play videos in embedded player
- Native Image Generator - Direct Open WebUI image generation with Ollama model management
- Hugging Face Image Generator - AI-powered image creation
- ComfyUI Image-to-Image (Qwen Edit 2509) - Advanced image editing with multi-image support
- ComfyUI ACE Step 1.5 Audio - Advanced music generation (New)
- ComfyUI ACE Step Audio (Legacy) - Advanced music generation
- ComfyUI Text-to-Video - Generate short videos from text using ComfyUI (default WAN 2.2 workflow)
- Flux Kontext ComfyUI - Professional image editing
- OpenWeatherMap Forecast Tool - Interactive weather widget with current conditions and forecasts
Function Pipes
- Planner Agent v2 - Advanced autonomous agent with specialized models, interactive guidance, and comprehensive execution management
- arXiv Research MCTS - Advanced research with Monte Carlo Tree Search
- Multi Model Conversations - Multi-agent discussions
- Resume Analyzer - Professional resume analysis
- Mopidy Music Controller - Music server management
- Letta Agent - Autonomous agent integration
- Perplexica Pipe - AI-powered web search with streaming responses and citations
- Google Veo Text-to-Video & Image-to-Video - Generate videos from text or a single image using Google Veo (only one image supported as input)
Filters
- Prompt Enhancer - Automatic prompt improvement
- Semantic Router - Intelligent model selection
- Full Document - File processing capabilities
- Clean Thinking Tags - Conversation cleanup
- OpenRouter WebSearch Citations - Enable web search for OpenRouter models with citation handling
Quick Start
Option 1: Open WebUI Hub (Recommended)
- Visit https://openwebui.com/u/haervwe
- Browse the collection and click "Get" for desired tools
- Follow the installation prompts in your Open WebUI instance
Option 2: Manual Installation
- Copy
.pyfiles fromtools/,functions/, orfilters/directories - Navigate to Open WebUI Workspace > Tools/Functions/Filters
- Paste the code, provide a name and description, then save
Key Features
- Plug-and-Play: Most tools work out of the box with minimal configuration
- Visual Integration: Seamless integration with ComfyUI workflows
- AI-Powered: Advanced features like MCTS research and autonomous planning
- Academic Focus: arXiv integration for research and academic work
- Creative Tools: Music generation and image editing capabilities
- Smart Routing: Intelligent model selection and conversation management
- Document Processing: Full document analysis and resume processing
Prerequisites
- Open WebUI: Version 0.6.0+ recommended
- Python: 3.8 or higher
- Optional Dependencies:
- ComfyUI (for image/music generation tools)
- Mopidy (for music controller)
- Various API keys (Hugging Face, Tavily, etc.)
Configuration
Most tools are designed to work with minimal configuration. Key configuration areas:
- API Keys: Required for some tools (Hugging Face, Tavily, etc.)
- ComfyUI Integration: For image and music generation tools
- Model Selection: Choose appropriate models for your use case
- Filter Setup: Enable filters in your model configuration
Detailed Documentation
Table of Contents
- arXiv Search Tool
- Perplexica Search Tool
- Pexels Media Search Tool
- YouTube Search & Embed Tool
- Native Image Generator
- Hugging Face Image Generator
- Cloudflare Workers AI Image Generator
- SearxNG Image Search Tool
- ComfyUI Image-to-Image Tool (Qwen Image Edit 2509)
- ComfyUI ACE Step 1.5 Audio Tool
- ComfyUI ACE Step Audio Tool (Legacy)
- ComfyUI Text-to-Video Tool
- OpenWeatherMap Forecast Tool
- Flux Kontext ComfyUI Pipe
- Google Veo Text-to-Video & Image-to-Video Pipe
- Planner Agent v2
- arXiv Research MCTS Pipe
- Multi Model Conversations Pipe
- Resume Analyzer Pipe
- Mopidy Music Controller
- Letta Agent Pipe
- Perplexica Pipe
- OpenRouter Image Pipe
- OpenRouter WebSearch Citations Filter
- Prompt Enhancer Filter
- Semantic Router Filter
- Full Document Filter
- Clean Thinking Tags Filter
- Using the Provided ComfyUI Workflows
- Installation
- Contributing
- License
- Credits
- Support
Tools
arXiv Search Tool
Description
Search arXiv.org for relevant academic papers on any topic. No API key required!
Configuration
- No configuration required. Works out of the box.
Usage
-
Example:
Search for recent papers about "tree of thought" -
Returns up to 5 most relevant papers, sorted by most recent.
Example arXiv search result in Open WebUI
Perplexica Search Tool
Description
Search the web for factual information, current events, or specific topics using the Perplexica API. This tool provides comprehensive search results with citations and sources, making it ideal for research and information gathering. Perplexica is an open-source AI-powered search engine and alternative to Perplexity AI that must be self-hosted locally. It uses advanced language models to provide accurate, contextual answers with proper source attribution.
Configuration
BASE_URL(str): Base URL for the Perplexica API (default:http://host.docker.internal:3001)OPTIMIZATION_MODE(str): Search optimization mode - "speed" or "balanced" (default:balanced)CHAT_MODEL(str): Default chat model for search processing (default:llama3.1:latest)EMBEDDING_MODEL(str): Default embedding model for search (default:bge-m3:latest)OLLAMA_BASE_URL(str): Base URL for Ollama API (default:http://host.docker.internal:11434)
Prerequisites: You must have Perplexica installed and running locally at the configured URL. Perplexica is a self-hosted open-source search engine that requires Ollama with the specified chat and embedding models. Follow the installation instructions in the Perplexica repository to set up your local instance.
Usage
-
Example:
Search for "latest developments in AI safety research 2024" -
Returns comprehensive search results with proper citations
-
Automatically emits citations for source tracking in Open WebUI
-
Provides both summary and individual source links
Features
- Web Search Integration: Direct access to current web information
- Citation Support: Automatic citation generation for Open WebUI
- Model Flexibility: Configurable chat and embedding models
- Real-time Status: Progress updates during search execution
- Source Tracking: Individual source citations with metadata
Pexels Media Search Tool
Description
Search and retrieve high-quality photos and videos from the Pexels API. This tool provides access to Pexels' extensive collection of free stock photos and videos, with comprehensive search capabilities, automatic citation generation, and direct image display in chat. Perfect for finding professional-quality media for presentations, content creation, or creative projects.
Configuration
PEXELS_API_KEY(str): Free Pexels API key from https://www.pexels.com/api/ (required)DEFAULT_PER_PAGE(int): Default number of results per search (default: 5, recommended for LLMs)MAX_RESULTS_PER_PAGE(int): Maximum allowed results per page (default: 15, prevents overwhelming LLMs)DEFAULT_ORIENTATION(str): Default photo orientation - "all", "landscape", "portrait", or "square" (default: "all")DEFAULT_SIZE(str): Default minimum photo size - "all", "large" (24MP), "medium" (12MP), or "small" (4MP) (default: "all")
Prerequisites: Get a free API key from Pexels API and configure it in the tool's Valves settings.
Usage
-
Photo Search Example:
Search for photos of "modern office workspace" -
Video Search Example:
Search for videos of "ocean waves at sunset" -
Curated Photos Example:
Get curated photos from Pexels
Features
- Three Search Functions:
search_photos,search_videos, andget_curated_photos - Direct Image Display: Images are automatically formatted with markdown for immediate display in chat
- Advanced Filtering: Filter by orientation, size, color, and quality
- Attribution Support: Automatic citation generation with photographer credits
- Rate Limit Handling: Built-in error handling for API limits and invalid keys
- LLM Optimized: Results are limited and formatted to prevent overwhelming language models
- Real-time Status: Progress updates during search execution
YouTube Search & Embed Tool
Description
Search YouTube for videos and display them in a beautiful embedded player directly in your Open WebUI chat. This tool provides comprehensive YouTube search capabilities with automatic citation generation, detailed video information, and a custom-styled embedded player. Perfect for finding tutorials, music videos, educational content, or any video content you need.
Configuration
YOUTUBE_API_KEY(str): YouTube Data API v3 key from https://console.cloud.google.com/apis/credentials (required)MAX_RESULTS(int): Maximum number of search results to return (default: 5, range: 1-10)SHOW_EMBEDDED_PLAYER(bool): Show embedded YouTube player for the first result (default:True)REGION_CODE(str): Region code for search results, e.g., "US", "GB", "JP" (default: "US")SAFE_SEARCH(str): Safe search filter - "none", "moderate", or "strict" (default: "moderate")
Prerequisites: Get a free YouTube Data API v3 key from Google Cloud Console and enable the YouTube Data API v3 in your project.
Usage
-
Search for Videos:
Search YouTube for "python tutorial for beginners" -
Play Specific Video:
Play YouTube video dQw4w9WgXcQ -
Search with Custom Results:
Search YouTube for "cooking recipes" with 10 results
Features
- Two Main Functions:
search_youtubefor searching andplay_videofor playing specific video IDs - Embedded Player: Beautiful custom-styled YouTube player embedded directly in chat with responsive design
- Safe Search: Built-in content filtering options
- Region Support: Localized search results based on region code
- Direct Links: Provides YouTube links and "Watch on YouTube" buttons
- Rate Limit Handling: Proper error handling for API quota limits
- Real-time Status: Progress updates during search and loading
Getting Started
-
Get a YouTube API Key:
- Visit Google Cloud Console
- Create a new project or select an existing one
- Enable the "YouTube Data API v3"
- Create credentials (API Key)
- Copy the API key
-
Configure the Tool:
- Open the tool's Valves settings in Open WebUI
- Paste your API key into the
YOUTUBE_API_KEYfield - Adjust other settings as desired (region, max results, etc.)
-
Start Searching:
- Use natural language: "Search YouTube for [topic]"
- Or use the function directly:
search_youtube("topic")
Example of YouTube video embedded in Open WebUI chat
Native Image Generator
Description
Generate images using Open WebUI's native image generation middleware configured in admin settings. This tool leverages whatever image generation backend you have configured (such as AUTOMATIC1111, ComfyUI, or OpenAI DALL-E) through Open WebUI's built-in image generation system, with optional Ollama model management to free up VRAM when needed.
Configuration
unload_ollama_models(bool): Whether to unload all Ollama models from VRAM before generating images (default:False)ollama_url(str): Ollama API URL for model management (default:http://host.docker.internal:11434)emit_embeds(bool): Whether to emit HTML image embeds via theembedsevent so generated images are displayed inline in the chat (default:True). WhenFalse, the tool will skip emitting embeds and only return bare download URLs. Ifemit_embedsisTruebut no event emitter is available, images cannot be displayed inline and only the URLs will be returned.
Prerequisites: You must have image generation configured in Open WebUI's admin settings under Settings > Images. This tool works with any image generation backend you have set up (AUTOMATIC1111, ComfyUI, OpenAI, etc.).
Usage
-
Example:
Generate an image of "a serene mountain landscape at sunset" -
Uses whatever image generation backend is configured in Open WebUI admin settings
-
Automatically manages model resources if Ollama unloading is enabled
-
Returns markdown-formatted image links for immediate display
Features
- Native Integration: Uses Open WebUI's native image generation middleware without external dependencies
- Backend Agnostic: Works with any image generation backend configured in admin settings (AUTOMATIC1111, ComfyUI, OpenAI, etc.)
- Memory Management: Optional Ollama model unloading to optimize VRAM usage
- Flexible Model Support: You can prompt de agent to change the image generation model, providing the name is given to it.
- Real-time Status: Provides generation progress updates via event emitter
- Error Handling: Comprehensive error reporting and recovery
Hugging Face Image Generator
Description
Generate high-quality images from text descriptions using Hugging Face's Stable Diffusion models.
Configuration
- API Key (Required): Obtain a Hugging Face API key from your HuggingFace account and set it in the tool's configuration in Open WebUI.
- API URL (Optional): Uses Stability AI's SD 3.5 Turbo model as default. Can be customized to use other HF text-to-image model endpoints.
Usage
-
Example:
Create an image of "beautiful horse running free" -
Multiple image format options: Square, Landscape, Portrait, etc.
Example image generated with Hugging Face tool
Cloudflare Workers AI Image Generator
Description
Generate images using Cloudflare Workers AI text-to-image models, including FLUX, Stable Diffusion XL, SDXL Lightning, and DreamShaper LCM. This tool provides model-specific prompt preprocessing, parameter optimization, and direct image display in chat. It supports fast and high-quality image generation with minimal configuration.
Configuration
cloudflare_api_token(str): Your Cloudflare API Token (required)cloudflare_account_id(str): Your Cloudflare Account ID (required)default_model(str): Default model to use (e.g.,@cf/black-forest-labs/flux-1-schnell)
Prerequisites: Obtain a Cloudflare API Token and Account ID from your Cloudflare dashboard. No additional dependencies beyond requests.
Usage
-
Example:
# Generate an image with a prompt
await tools.generate_image(prompt="A futuristic cityscape at sunset, vibrant colors") -
Returns a markdown-formatted image link for immediate display in chat.
Features
- Multiple Models: Supports FLUX, SDXL, SDXL Lightning, DreamShaper LCM
- Prompt Optimization: Automatic prompt enhancement for best results per model
- Parameter Handling: Smart handling of steps, guidance, negative prompts, and size
- Direct Image Display: Returns markdown image links for chat
- Error Handling: Comprehensive error and status reporting
- Real-time Status: Progress updates via event emitter
SearxNG Image Search Tool
Description
Search and retrieve images from the web using a self-hosted SearxNG instance. This tool provides privacy-respecting, multi-engine image search with direct image display in chat. Ideal for finding diverse images from multiple sources without tracking or ads.
Configuration
SEARXNG_ENGINE_API_BASE_URL(str): The base URL for the SearxNG search engine API (default:http://searxng:4000/search)MAX_RESULTS(int): Maximum number of images to return per search (default: 5)
Prerequisites: You must have a running SearxNG instance. See SearxNG documentation for setup instructions.
Usage
-
Example:
# Search for images of cats
await tools.search_images(query="cats", max_results=3) -
Returns a list of markdown-formatted image links for immediate display in chat.
Features
- Privacy-Respecting: No tracking, ads, or profiling
- Multi-Engine: Aggregates results from multiple search engines
- Direct Image Display: Images are formatted for chat display
- Customizable: Choose engines, result count, and more
- Error Handling: Handles connection and search errors gracefully
Function Pipes
Perplexica Pipe
Description
AI-powered web search using Perplexica with streaming responses, intelligent citations, and comprehensive source tracking. This function pipe integrates with your self-hosted Perplexica instance to provide real-time web search capabilities with proper source attribution, making it perfect for research, fact-checking, and staying up-to-date with current events.
Configuration
enable_perplexica(bool): Enable or disable Perplexica search (default:True)perplexica_api_url(str): Perplexica API endpoint (default:http://localhost:3001/api/search)perplexica_chat_provider(str): Provider ID for chat model (default:550e8400-e29b-41d4-a716-446655440000)perplexica_chat_model(str): Chat model to use (default:gpt-4o-mini)perplexica_embedding_provider(str): Provider ID for embeddings (default:550e8400-e29b-41d4-a716-446655440000)perplexica_embedding_model(str): Embedding model to use (default:text-embedding-3-large)perplexica_focus_mode(str): Search focus mode (default:webSearch)perplexica_optimization_mode(str): Optimization mode - "speed" or "balanced" (default:balanced)task_model(str): Model for non-search tasks (default:gpt-4o-mini)max_history_pairs(int): Maximum conversation history pairs to include (default: 12)perplexica_timeout_ms(int): HTTP socket read timeout in milliseconds (default: 1500)
Prerequisites: You must have Perplexica installed and running locally. Perplexica is an open-source AI-powered search engine that requires setup with Ollama or OpenAI-compatible providers.
Usage
-
Example:
Investigate the latest news on AI regulation for different areas US europe , china, etc, do only one tool call -
Automatically routes search queries to Perplexica
-
Provides streaming responses with real-time updates
-
Emits citations with source metadata for each result
-
Handles conversation history for contextual searches
Features
- Streaming Support: Real-time streaming responses for faster interaction
- Smart Citations: Automatic citation generation with metadata (title, URL, content)
- Conversation History: Maintains context from previous messages (configurable)
- Multiple Focus Modes: webSearch, academicSearch, youtubeSearch, and more
- Status Updates: Real-time progress updates during search
- Source Tracking: Comprehensive source metadata with URLs and snippets
- Task Routing: Intelligent routing between search and non-search tasks
- Error Handling: Robust error handling with user-friendly messages
Getting Started
-
Install Perplexica:
- Follow the Perplexica installation guide
- Set up your chat and embedding providers (Ollama, OpenAI, etc.)
- Start the Perplexica server (default: http://localhost:3001)
-
Configure the Pipe:
- Open the pipe's Valves settings in Open WebUI
- Set
perplexica_api_urlto your Perplexica instance URL - Configure your chat and embedding providers/models
- Adjust focus mode and optimization settings as needed
-
Start Searching:
- Select the "Perplexica Pipe" model in Open WebUI
- Ask questions or request web searches
- View results with automatic citations and source links
Example of Perplexica pipe search results with citations in Open WebUI
ComfyUI Image-to-Image Tool (Qwen Image Edit 2509)
Description
Edit and transform images using ComfyUI workflows with AI-powered image editing. Features the Qwen Image Edit 2509 model as default, supporting up to 3 images for advanced editing with context, style transfer, and multi-image blending. Also includes Flux Kontext workflow for artistic transformations. Images are automatically extracted from message attachments and rendered as beautiful HTML embeds.
Configuration
comfyui_api_url(str): ComfyUI HTTP API endpoint (default:http://localhost:8188)workflow_type(str): Choose your workflow--"Flux_Kontext", "QWen_Edit", or "Custom" (default:QWen_Edit)custom_workflow(Dict): Custom ComfyUI workflow JSON (only used when workflow_type='Custom')max_wait_time(int): Maximum wait time in seconds for job completion (default:600)unload_ollama_models(bool): Automatically unload Ollama models from VRAM before generating images (default:False)ollama_api_url(str): Ollama API URL for model management (default:http://localhost:11434)return_html_embed(bool): Return a beautiful HTML image embed with comparison view (default:True)
Prerequisites: You must have ComfyUI installed and running with the required models and custom nodes:
- For Flux Kontext: Flux Dev model, Flux Kontext LoRA, and required ComfyUI nodes
- For Qwen Edit 2509: Qwen Image Edit 2509 model, Qwen CLIP, VAE, and ETN_LoadImageBase64 custom node
- See the Extras folder for workflow JSON files:
flux_context_owui_api_v1.jsonandimage_qwen_image_edit_2509_api_owui.json
Usage
-
Example:
# Attach image(s) and provide editing instructions
"Remove the background"
"Change car to red"
"Apply lighting from first image to second image"
Features
- Qwen Edit 2509 (Default): State-of-the-art image editing with precise control and instruction-following
- Multi-Image Support: Qwen Edit workflow accepts 1-3 images for advanced editing with context and style transfer
- Dual Workflow Support: Switch to Flux Kontext for artistic transformations and creative reimagining
- Automatic Image Handling: Images are extracted from messages and passed to the AI automatically
- VRAM Management: Optional Ollama model unloading to free GPU memory before generation
- Beautiful HTML Embeds: Displays results with elegant before/after comparison view
- OpenWebUI Integration: Automatically uploads generated images to OpenWebUI storage
- Flexible Workflows: Use built-in workflows or provide your own custom ComfyUI JSON
Workflow Details
Qwen Edit 2509 (Default):
- Supports 1-3 images with multi-image context and style transfer
- Lightning-fast 4-step generation
- Best for: precise edits, object manipulation, style transfer
Flux Kontext (Alternative):
- Single image input (multi-image support planned)
- 20-step high-quality generation
- Best for: artistic transformations, creative reimagining
Custom Workflow:
- Bring your own ComfyUI workflow JSON
- Full flexibility for advanced users
Getting Started
-
Set up ComfyUI:
- Install ComfyUI
- Download required models (Flux Dev, Qwen Edit 2509, etc.)
- Install necessary custom nodes (especially
ETN_LoadImageBase64for Qwen workflow)
-
Import workflows:
- Load
Extras/flux_context_owui_api_v1.jsonorExtras/image_qwen_image_edit_2509_api_owui.jsonin ComfyUI - Verify all nodes are recognized (install missing custom nodes if needed)
- Load
-
Configure the tool:
- Set
comfyui_api_urlto your ComfyUI server address - Choose your preferred workflow type
- Optionally enable Ollama model unloading if you have limited VRAM
- Set
-
Start editing:
- Attach an image (or up to 3 for multi-image editing) to your message
- Describe your desired transformation in natural language
- Watch the magic happen!
Note for Custom Workflows: If you're using a custom workflow with different capabilities (e.g., single-image only or different prompting requirements), you should modify the edit_image function's docstring in the tool code. The docstring instructs the AI on how to use the tool and what prompting strategies work best. Adjust it to match your workflow's specific capabilities and requirements.
Multi-Image Support Status:
- Qwen Edit 2509: Full support for 1-3 images (default workflow)
- Flux Kontext: Single image currently; multi-image support planned for future release
- Custom workflows: Depends on your workflow implementation
Example of Qwen Image Edit 2509 transforming a cyberpunk dolphin into a natural mountain scene
ComfyUI ACE Step 1.5 Audio Tool
Description
Generate high-quality music using the improved ACE Step 1.5 model via ComfyUI. This tool builds upon the legacy version with enhanced control over musical elements like key, time signature, BPM, and language. It features the same beautiful embedded player and supports batch generation.
Configuration
comfyui_api_url(str): ComfyUI API endpoint (default:http://localhost:8188)model_name(str): ACE Step 1.5 checkpoint name (default:ace_step_1.5_turbo_aio.safetensors)batch_size(int): Number of tracks to generate per request (default:1)max_duration(int): Maximum song duration in seconds (default:180)max_number_of_steps(int): Maximum allowed sampling steps (default:50)max_wait_time(int): Max wait time for generation in seconds (default:600)workflow_json(str): ComfyUI Workflow JSON (default:ace_step_1.5_workflow)checkpoint_node(str): Node ID for CheckpointLoaderSimple (default:"97")text_encoder_node(str): Node ID for TextEncodeAceStepAudio1.5 (default:"94")empty_latent_node(str): Node ID for EmptyAceStep1.5LatentAudio (default:"98")sampler_node(str): Node ID for KSampler (default:"3")save_node(str): Node ID for SaveAudioMP3 (default:"104")vae_decode_node(str): Node ID for VAEDecodeAudio (default:"18")unload_node(str): Node ID for UnloadAllModels (default:"105")owui_base_url(str): Open WebUI base URL (default:http://localhost:3000)save_local(bool): Save generated audio to local storage (default:True)show_player_embed(bool): Show the embedded audio player (default:True)unload_comfyui_models(bool): Unload models after generation using ComfyUI-Unload-Model node (default:False)
Prerequisites
-
ComfyUI-Unload-Model Node: To use the model unloading feature (
unload_comfyui_models), you must install the ComfyUI-Unload-Model custom node in your ComfyUI instance.Note: You can use other model unloading nodes in a custom workflow, but you must correctly configure the
unload_nodevalve with the ID of that node.
User Configuration (Per-User Valves)
Users can customize these settings for their individual sessions by clicking the "Valves" icon in the chat interface:
generate_audio_codes(bool): Enable/disable audio code generation. Disabling it (Fast Mode) speeds up generation but may reduce quality (default:True)steps(int): Number of sampling steps for generation. Higher values may improve quality but take longer (default:8, capped by Adminmax_number_of_steps)seed(int): Random seed for generation. Set to-1for random, or a specific number for reproducible results (default:-1)
Usage
-
Example:
Generate a "cyberpunk, darkwave" song about "AI takeover" in E minor, 140 BPM, duration 60s -
Advanced Features:
- Control Key Scale (e.g., "C Major", "F# Minor")
- Set Time Signature (e.g., 4/4, 3/4)
- Choose Language (e.g., "en", "ja", "zh")
Features
- New in 1.5: Key scale, time signature, language support, and improved audio quality
- Batch Generation: Generate multiple variations at once
- Embedded Player: Sleek, transparent player with lyrics and waveform visualization
- Customizable: Full control over generation parameters
ComfyUI ACE Step Audio Tool (Legacy)
Description
Generate music using the ACE Step AI model via ComfyUI. This tool lets you create songs from tags and lyrics, with full control over the workflow JSON and node numbers. Features a beautiful, transparent custom audio player with play/pause controls, progress tracking, volume adjustment, and a clean scrollable lyrics display. Designed for advanced music generation and can be customized for different genres and moods.
Configuration
comfyui_api_url(str): ComfyUI API endpoint (e.g.,http://localhost:8188)model_name(str): Model checkpoint to use (default:ACE_STEP/ace_step_v1_3.5b.safetensors)workflow_json(str): Full ACE Step workflow JSON as a string. Use{tags},{lyrics}, and{model_name}as placeholders.tags_node(str): Node number for the tags input (default:"14")lyrics_node(str): Node number for the lyrics input (default:"14")model_node(str): Node number for the model checkpoint input (default:"40")save_local(bool): Copy the generated song to Open WebUI storage backend (default:True)owui_base_url(str): Your Open WebUI base URL (default:http://localhost:3000)show_player_embed(bool): Show the embedded audio player. If false, only returns download link (default:True)
Usage
- Import the ACE Step workflow:
- In ComfyUI, go to the workflow import section and load
extras/ace_step_api.json. - Adjust nodes as needed for your setup.
- In ComfyUI, go to the workflow import section and load
- Configure the tool in Open WebUI:
- Set the
comfyui_api_urlto your ComfyUI backend. - Paste the workflow JSON (from the file or your own) into
workflow_json. - Set the correct node numbers if you modified the workflow.
- Set the
- Generate music:
- Provide a song title, tags, and (optionally) lyrics.
- The tool will return either an embedded audio player or a download link based on your configuration.
-
Example:
Generate a song About Ai and Humanity friendship
The sleek, transparent audio player embedded in Open WebUI chat
Features
- Custom Audio Player: Beautiful, semi-transparent player with blur effects
- Full Playback Controls: Play/pause, seek, volume control with SVG icons
- Song Title Display: User-defined song titles prominently shown
- Scrollable Lyrics: Clean lyrics display with custom scrollbar (max 120px height)
- Transparent UI: Integrates seamlessly with any Open WebUI theme
- Toggle Player: Option to show/hide player embed and just return download links
- Local Storage: Optionally saves songs to Open WebUI cache for persistence
Returns an embedded audio player with download link or just the link, depending on configuration. Advanced users can fully customize the workflow for different genres, moods, or creative experiments.
ComfyUI Text-to-Video Tool
Description
Generate short videos from text prompts using a ComfyUI workflow that defaults to the WAN 2.2 text-to-video models. This tool wraps the ComfyUI HTTP + WebSocket API, waits for the job to complete, extracts the produced video, and (optionally) uploads it to Open WebUI storage so it can be embedded in chat.
The default workflow file included in this repository is extras/video_wan2_2_14B_t2v.json and the tool implementation lives at tools/text_to_video_comfyui_tool.py.
Configuration
comfyui_api_url(str): ComfyUI HTTP API endpoint (default:http://localhost:8188)prompt_node_id(str): Node ID in the workflow that receives the text prompt (default:"89")workflow(json/dict): ComfyUI workflow JSON; if empty the bundled WAN 2.2 workflow is usedmax_wait_time(int): Maximum seconds to wait for the ComfyUI run (default:600)unload_ollama_models(bool): Whether to unload Ollama models from VRAM before running (default:False)ollama_api_url(str): Ollama API URL used when unloading models (default:http://localhost:11434)
Usage
- Import the workflow
- In ComfyUI, import the workflow JSON
extras/video_wan2_2_14B_t2v.jsonif you want to inspect or modify nodes.
- Install / Configure the tool
- Copy
tools/text_to_video_comfyui_tool.pyinto your Open WebUI tools and set thecomfyui_api_urland other valves as needed in the tool settings.
- Generate a video
- Call the tool with a prompt (e.g. "A cyberpunk panda skating through neon streets, 3s shot") and wait for the job to complete. The tool emits progress events and will provide an embedded HTML player or a direct ComfyUI URL.
Example:
Generate a 3 second shot of "a cyberpunk panda skating through neon city streets" using the default WAN 2.2 workflow
Example short video generated via ComfyUI WAN 2.2 workflow (thumbnail).
Features
- Uses WAN 2.2 text-to-video model workflow by default (
video_wan2_2_14B_t2v.json) - Submits workflow to ComfyUI and listens on WebSocket for completion
- Extracts produced video files and optionally uploads them to Open WebUI storage for inline embedding
- Optional Ollama VRAM unloading to free memory before runs
- Configurable prompt node and wait timeout
OpenWeatherMap Forecast Tool
Description
Tool that fetches weather forecasts using the OpenWeatherMap API and displays an interactive HTML weather widget with current conditions, hourly, and daily forecasts. Supports both the free 2.5 API and the premium One Call 3.0 API.
Configuration
openweathermap_api_key(str): Your OpenWeatherMap API key (required)api_version(str): API version: '2.5' (free, includes current + 5-day/3h forecast) or '3.0' (One Call API, requires separate subscription) (default:2.5)units(str): Units of measurement: 'metric', 'imperial', or 'standard' (default:metric)language(str): Language code for weather descriptions (default:en)show_weather_embed(bool): Show the embedded weather widget (default:True)
Usage
-
Example:
What is the weather like in Tokyo, JP? -
Fetches current conditions, hourly forecast, and multi-day daily forecast
-
Displays an interactive weather widget and returns a text summary for the LLM
Example OpenWeatherMap Forecast Tool widget
Function Pipes
Flux Kontext ComfyUI Pipe
Description
A pipe that connects Open WebUI to the Flux Kontext image-to-image editing model through ComfyUI. This integration allows for advanced image editing, style transfers, and other creative transformations using the Flux Kontext workflow. Features an interactive /setup command system for easy configuration by administrators.
Configuration
The pipe includes an interactive setup system that allows administrators to configure all settings through chat commands. Most configuration can be done using the /setup command, which provides an interactive form for easy adjustment of parameters.
Key Configuration Options:
- COMFYUI_ADDRESS: Address of the running ComfyUI server (default:
http://127.0.0.1:8188) - COMFYUI_WORKFLOW_JSON: The entire ComfyUI workflow in JSON format
- PROMPT_NODE_ID: Node ID for text prompt input (default:
"6") - IMAGE_NODE_ID: Node ID for Base64 image input (default:
"196") - KSAMPLER_NODE_ID: Node ID for the sampler node (default:
"194") - ENHANCE_PROMPT: Enable vision model-based prompt enhancement (default:
False) - VISION_MODEL_ID: Vision model to use for prompt enhancement
- UNLOAD_OLLAMA_MODELS: Free RAM by unloading Ollama models before generation (default:
False) - MAX_WAIT_TIME: Maximum wait time for generation in seconds (default:
1200) - AUTO_CHECK_MODEL_LOADER: Auto-detect model loader type for .safetensors or .gguf (default:
False)
Usage
Initial Setup
-
Import the workflow:
- In ComfyUI, import
extras/flux_context_owui_api_v1.jsonas a workflow - Adjust node IDs if you modify the workflow
- In ComfyUI, import
-
Configure using /setup command (Admin only):
- Type
/setupin the chat to launch the interactive configuration form - The form will display all current settings with input fields
- Adjust any settings you need to change
- Submit the form to apply and optionally save the configuration
- Settings can be persisted to a backend config file for permanent storage
- Type
-
Alternative: Manual configuration:
- Access the pipe's Valves in Open WebUI's admin panel
- Set
COMFYUI_ADDRESSto your ComfyUI backend - Paste the workflow JSON into
COMFYUI_WORKFLOW_JSON - Configure node IDs and other parameters as needed
Using the Pipe
-
Basic image editing:
- Upload an image to the chat
- Provide a text prompt describing the desired changes
- The pipe processes the image through ComfyUI and returns the edited result
-
Enhanced prompts (optional):
- Enable
ENHANCE_PROMPTin settings - Set a
VISION_MODEL_ID(e.g., a multimodal model like LLaVA or GPT-4V) - The vision model will analyze the input image and automatically refine your prompt for better results
- Enable
-
Memory management:
- Enable
UNLOAD_OLLAMA_MODELSto free RAM before generation - The default workflow includes a
Clean VRAMnode for VRAM management in ComfyUI
- Enable
Example - Image editing:
Prompt: "Edit this image to look like a medieval fantasy king, preserving facial features."
[Upload image]
Example of Flux Kontext /setup command interface
Example of Flux Kontext image editing output
Google Veo Text-to-Video & Image-to-Video Pipe
Description
Generate high-quality videos from text prompts or a single image using Google Veo via the Gemini API. This pipe enables advanced video generation capabilities directly from Open WebUI, supporting creative and professional use cases. It supports both text-to-video and image-to-video generation.
Note: Only one image is supported as input at this time. Multi-image input is not available.
Configuration
GOOGLE_API_KEY(str): Google API key for Gemini API access (required)MODEL(str): The Veo model to use for video generation (default: "veo-3.1-generate-preview")ENHANCE_PROMPT(bool): Use vision model to enhance prompt (default: False)VISION_MODEL_ID(str): Vision model to be used as prompt enhancerENHANCER_SYSTEM_PROMPT(str): System prompt for prompt enhancement processMAX_WAIT_TIME(int): Max wait time for video generation in seconds (default: 1200)
Prerequisites:
- You must have access to the Google Gemini API and a valid API key.
- Only one image is supported as input for image-to-video generation (Gemini API limitation).
Usage
-
Text-to-Video Example:
Generate a video of "a futuristic city at sunset with flying cars" -
Image-to-Video Example:
Create a video from this image: [Attach image]
Features
- Text-to-Video: Generate videos from descriptive text prompts
- Image-to-Video: Animate a single image into a video sequence
- High Quality: Leverages Google Veo's advanced video generation models
- Direct Embedding: Returns markdown-formatted video links for display in chat
- Status Updates: Progress and error reporting during generation
Limitations
- Only one image is supported as input for image-to-video generation (Gemini API limitation)
- Multi-image or video editing features are not available
Example Output
Example of Google Veo video generation output in Open WebUI
Planner Agent v2
Advanced autonomous agent with specialized model support, interactive user guidance, and comprehensive execution management.
This powerful agent autonomously generates and executes multi-step plans to achieve complex goals. It's a generalist agent capable of handling any text-based task, making it ideal for complex requests that would typically require multiple prompts and manual intervention.
Key Features
- Intelligent Planning: Automatically breaks down goals into actionable steps with dependency mapping
- Specialized Models: Dedicated models for writing (WRITER_MODEL), coding (CODER_MODEL), and tool usage (ACTION_MODEL) with automatic routing
- Quality Control: Real-time output analysis with quality scoring (0.0-1.0) and iterative improvement
- Interactive Error Handling: When actions fail or produce low-quality outputs, the system pauses and prompts you with options: retry with custom guidance/instructions, retry as-is, approve current output despite warnings, or abort the entire plan execution
- Live Progress: Real-time Mermaid diagrams with color-coded status indicators
- Template System: Final synthesis using
{{action_id}}placeholders for seamless content assembly - Native Tool Integration: Automatically discovers and uses all available Open WebUI tools
- Advanced Features: Lightweight context mode, concurrent execution, cross-action references (
@action_id), and comprehensive validation - MCP(OpenAPI servers) Support: Model Context Protocol integration coming soon for extended tool capabilities
Configuration
Core Models:
MODEL: Main planning LLMACTION_MODEL: Tool-based actions and general tasksWRITER_MODEL: Creative writing and documentationCODER_MODEL: Code generation and development
Temperature Controls:
PLANNING_TEMPERATURE(0.8): Planning creativityACTION_TEMPERATURE(0.7): Tool execution precisionWRITER_TEMPERATURE(0.9): Creative writing freedomCODER_TEMPERATURE(0.3): Code generation accuracyANALYSIS_TEMPERATURE(0.4): Output analysis precision
Execution Settings:
MAX_RETRIES(3): Retry attempts per actionCONCURRENT_ACTIONS(1): Parallel processing limitACTION_TIMEOUT(300): Individual action timeoutSHOW_ACTION_SUMMARIES(true): Detailed execution summariesAUTOMATIC_TAKS_REQUIREMENT_ENHANCEMENT(false): AI-enhanced requirements
Usage Examples
Multi-Media Content:
search the latest AI news and create a song based on that, with that , search for stock images to use a "album cover" and create a mockup of the spotify in a plain html file with vanilla js layout using those assets embeded for interactivity
Example of Planner Agent in action Using Gemini 2.5 flash and local music generation
Creative Writing:
create an epic sci fi Adult novel based on the current trends on academia news and social media about AI and other trending topics, with at least 10 chapters, well crafter world with rich characters , save each chapter in a folter named as the novel in obsidian with an illustration
Example of Planner Agent in action Using Gemini 2.5 flash and local image generation, local saving to obsidian and websearch
Interactive Error Recovery: The Planner Agent features intelligent error handling that engages with users when actions fail or produce suboptimal results. When issues occur, the system pauses execution and presents you with interactive options:
- Retry with Guidance: Provide custom instructions to help the agent understand what went wrong and how to improve
- Retry As-Is: Attempt the action again without modifications
- Approve Output: Accept warning-level outputs despite quality concerns
- Abort Execution: Stop the entire plan if the issue is critical
Example scenario: If an action fails to generate proper code or retrieve expected data,
you'll be prompted to either provide specific guidance ("try using a different approach")
or decide whether to continue with the current output.