Name	Name	Last commit message	Last commit date
Latest commit History 618 Commits
.github	.github
.idea	.idea
app	app
docs	docs
gradle	gradle
libs	libs
memory-vault	memory-vault
neuron-packet	neuron-packet
.gitignore	.gitignore
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md
build.gradle.kts	build.gradle.kts
gradle.properties	gradle.properties
gradlew	gradlew
gradlew.bat	gradlew.bat
settings.gradle.kts	settings.gradle.kts

ToolNeuron

Privacy-First AI Assistant for Android - Complete On-Device Intelligence

ToolNeuron is the most advanced offline-first AI assistant for Android, featuring complete on-device processing with enterprise-grade encryption, intelligent document understanding through RAG (Retrieval-Augmented Generation), text-to-speech, an extensible plugin system, AI character cards (TavernAI v2 compatible), persistent AI memory, and sophisticated memory management. Your data never leaves your device. No cloud dependencies. No subscriptions. True digital sovereignty.

Download APK * Join Discord * Report Issue

Why ToolNeuron?

Complete Privacy: Hardware-backed AES-256-GCM encryption. Zero telemetry. All processing happens on your device.

Sophisticated RAG System: Inject and query documents (PDF, Word, Excel, EPUB) with semantic search and encrypted knowledge bases.

Secure Memory Vault: Crash-recoverable encrypted storage with Write-Ahead Logging, LZ4 compression, and content deduplication.

Offline-First: Works completely offline after model downloads. No internet required for AI inference.

On-Device TTS: Text-to-speech with 10 voices, 5 languages, adjustable speed and quality -- all processed locally.

AI Character Cards: Full TavernAI v2 compatible persona system -- import/export character cards, avatar images, template variables ({{char}}/{{user}}), and post-history reinforcement for consistent roleplay.

Persistent AI Memory: Mem0-inspired memory system that learns about you across conversations -- automatic fact extraction, deduplication, forgetting curve, and persona-aware filtering.

Plugin System: 7 built-in plugins -- web search, file manager, calculator, clipboard, date/time, device info, and developer utilities -- extensible with custom plugins.

Advanced Features: Function calling, multi-modal generation, customizable inference parameters, and concurrent model downloads.

Features Overview
Text Generation
Image Generation
Text-to-Speech (TTS)
AI Personas & Character Cards
AI Memory System
Plugin System
RAG System (Document Intelligence)
Memory Vault (Secure Storage)
Document Processing
Model Management
Privacy & Security
Installation
Quick Start
Technical Details
Use Cases
Building from Source
Roadmap
FAQ

Features Overview

Core Capabilities

Feature	Description
Text Generation	Run any GGUF model locally (Llama, Mistral, Gemma, Phi, Qwen, etc.) with streaming output
Image Generation	Stable Diffusion 1.5 with censored & uncensored variants, inpainting support
Text-to-Speech	On-device TTS with 10 voices, 5 languages, adjustable speed and denoising steps
AI Character Cards	TavernAI v2 compatible personas with import/export, avatar images, template vars, post-history reinforcement
AI Memory	Persistent memory across conversations -- automatic fact extraction, deduplication, forgetting curve
Plugin System	7 plugins (web search, file manager, calculator, clipboard, date/time, device info, dev utils) with tool calling
RAG System	Document injection with hybrid search (BM25 + vector + RRF + MMR), encrypted knowledge bases
Memory Vault	Hardware-backed AES-256-GCM encryption, WAL crash recovery, LZ4 compression
Document Processing	Parse PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, and plain text
Model Store	Browse and download models from HuggingFace -- General, Coding, Medical, Uncensored categories
Function Calling	Grammar-constrained tool calling with multi-turn agent execution (up to 5 rounds)
Secure Storage	Content deduplication, three-tier caching, automatic defragmentation
No Permissions	Load models without storage permissions using Android SAF

AI Personas & Character Cards

Full character card system compatible with TavernAI v2 / SillyTavern format. Create, edit, import, and export AI personas with rich personality definitions.

Character Card Fields

Field	Description
Name	Character's display name
Avatar	Image or emoji avatar (stored locally)
Description	Character background, lore, and traits
Personality	Core personality traits (PList or prose)
Scenario	Current scene or context
First Message	Opening greeting
Alternate Greetings	Multiple greeting variations
Example Messages	Dialogue samples for style calibration
Tags	Categorization tags
Creator Notes	Author notes (not sent to model)
System Prompt	Legacy raw system prompt (fallback)

Key Features

Template Variables: {{char}} resolves to persona name, {{user}} resolves to "User" -- applied throughout system prompt and messages
Identity Framing: Automatic "You are {{char}}" directive prevents the model from confusing character/user identity
Post-History Reinforcement: Compressed personality reminder injected after chat history (most influential position for small models, based on SillyTavern/MiniMax research)
TavernAI v2 Import/Export: Import .json character cards from SillyTavern, Chub.ai, or any TavernAI v2 source. Export your personas to share
Avatar Images: Pick images from gallery, stored locally in app files (no external URI permissions needed)
Default Personas: Ships with Assistant, Luna, CodeBuddy, and Sage

How It Works

System prompt is assembled from structured fields: description + personality + scenario + example messages. If no structured fields exist, falls back to raw systemPrompt. The post-history instruction is injected right before the model generates, providing maximum influence on small on-device models.

AI Memory System

Persistent memory that lets the AI remember facts about you across conversations. Inspired by Mem0.

How It Works

After each conversation turn, the MemoryExtractor uses the loaded LLM to extract user facts
Facts are deduplicated using Jaccard similarity (threshold 0.7) with AUDN cycle (Add/Update/Delete/Noop)
Stored memories are injected into the system prompt as context for future conversations
Persona-aware: When a character persona is active, the extractor filters out roleplay traits -- only real user facts are saved

Memory Features

Automatic Extraction: Facts extracted from conversations without user intervention
Categories: Personal, Preference, Work, Interest, General
Forgetting Curve: Memories decay over time based on access frequency and recency
Memory Browser: View, search, filter, edit, and delete memories from the AI Memory screen
Manual Memories: Add custom facts the AI should remember
Source Tracking: Each memory links back to the conversation it came from

Text Generation

Model Support

Format: Any GGUF model (Llama 3, Mistral, Gemma, Phi, Qwen, etc.)
Size Range: 500MB (1B models) to 20GB+ (70B models)
Quantization: All GGUF quantizations supported (Q2_K, Q4_K_M, Q5_K_S, Q6_K, Q8_0, F16, etc.)
Model Categories: General, Medical, Research, Coding, Uncensored, Business, Cybersecurity

Performance

Device Tier	RAM	Model Size	Speed
Budget	6GB	1-3B Q4	2-4 tokens/sec
Mid-Range	8GB	7-8B Q4	4-8 tokens/sec
Flagship	12GB+	8B Q6	8-15 tokens/sec

Reports from users: 7-second response times for 8B Q6 models on flagship devices

Advanced Features

Streaming Output: Real-time token-by-token generation with Flow
Custom Parameters: Temperature, top-k, top-p, min-p, repeat penalty, context length
System Prompts: Configure per-model system prompts
Function Calling: Tool/function calling with grammar-based JSON schema enforcement
Model Configuration: Save and manage configurations per model
Device Optimization: Auto-detect device tier and recommend optimal parameters
Memory Management: Memory-mapped model loading, automatic RAM optimization

Supported Callbacks

onToken(token: String) // Real-time token streaming onToolCall(toolCall: ToolCall) // Function call detection onDone(metrics: Metrics) // Generation completion onError(error: Throwable) // Error handling onMetrics(metrics: Metrics) // Performance metrics

Image Generation

Stable Diffusion 1.5

Models: Censored and uncensored variants
Engine: LocalDream integration with NPU/CPU support
Generation Time: 30-90 seconds depending on device hardware

Capabilities

Text-to-Image: Generate images from text prompts
Inpainting: Edit specific regions with mask support
Custom Parameters:
- Resolution: 512x512, 768x768, 1024x1024
- Steps: 10-50 inference steps
- CFG Scale: Prompt adherence control
- Seed: Reproducible generation
- Negative Prompts: Exclude unwanted elements
- Denoise Strength: Inpainting intensity
- Schedulers: DPM and other scheduler support

Advanced Features

Intermediate Results: View generation progress with intermediate images
Safety Checker: Optional NSFW content filtering
Pony Model Support: Specialized anime/cartoon models
Backend Control: Start, stop, restart generation backend
State Monitoring: Real-time backend and generation state tracking

Text-to-Speech (TTS)

On-device speech synthesis powered by Supertonic TTS. All processing happens locally -- no cloud APIs, no data leaves your device.

Voice Options

10 Voices: 5 female (F1-F5) and 5 male (M1-M5)
5 Languages: English, Korean, Spanish, Portuguese, French
Speed Control: 0.5x to 2.0x playback speed
Denoising Steps: 1-8 steps (higher = better quality, slower synthesis)

Features

Auto-speak: Automatically read assistant responses aloud after generation
On-demand Loading: TTS model loads automatically on first use if not preloaded
Load on App Start: Optionally preload the TTS model at launch for instant speech
NNAPI Acceleration: Hardware acceleration on supported devices
Playback Controls: Play, pause, resume, stop with real-time synthesis progress
Per-message TTS: Tap the speak button on any assistant message to hear it

Settings

All TTS preferences are persisted and configurable from the Settings screen:

Voice, language, speed, denoising steps
Auto-speak toggle
NNAPI hardware acceleration toggle
Load on app start toggle

Plugin System

Extensible plugin architecture that integrates with LLM tool calling. Plugins execute locally and render custom UI for results.

Built-in Plugins (7)

Web Search

Engine: DuckDuckGo search with configurable result count (5-10)
Web Scraping: CSS selector-based content extraction from search results
Safe Search: Optional safe search filtering
Custom UI: Rich display of search results and scraped content

File Manager

File Operations: List files, read text files, create files
Document Reading: PDF and DOCX parsing via DocumentParser
Search: Find files by name across app directories

Calculator

Expressions: Supports +, -, *, /, ^, %, parentheses
Functions: sqrt, sin, cos, tan, asin, acos, atan, log, log10, ln, abs, ceil, floor, round
Constants: pi, e
Unit Conversion: Length (m, km, mi, ft, in...), weight (kg, lb, oz...), time (s, min, h, day), data (b, kb, mb, gb, tb), temperature (C, F, K)

Clipboard

Read: Access current clipboard content
Write: Copy text to system clipboard

Date & Time

Current Time: Get date/time with timezone support
Arithmetic: Add/subtract days, hours, minutes from dates
Timezone Conversion: Convert between any timezones

Device Info

System: OS version, device model, CPU architecture
Resources: RAM usage, battery level, storage available
Network: Connectivity status

Dev Utils

Text Transforms: uppercase, lowercase, reverse, title_case, snake_case, camel_case, trim
Hashing: MD5, SHA-1, SHA-256, SHA-512
UUID Generation: Bulk generation up to 10 at once
Text Statistics: Character, word, line, and sentence counts
JSON: Formatting and validation
Base64: Encoding and decoding

Plugin Architecture

Tool Calling Integration: Plugins register as tools the LLM can invoke via grammar-based JSON schema enforcement (LAZY/STRICT modes)
Custom UI Rendering: Each plugin provides Compose UI for displaying results
Enable/Disable: Toggle individual plugins from the Settings screen
Execution Metrics: Tracks execution time, success/failure per plugin call

RAG System (Document Intelligence)

The RAG (Retrieval-Augmented Generation) system enables Tool-Neuron to inject external knowledge into conversations, allowing AI to answer questions based on your documents with semantic understanding.

RAG Creation Methods

1. From Text

Create knowledge bases from plain text input:

- Paste or type text content - System chunks and embeds automatically - Instant semantic search capability

2. From Files

Parse and embed documents:

Supported Formats: PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, TXT
Multi-Sheet Excel: Each sheet embedded separately with metadata
Table Extraction: Word tables preserved with structure
Automatic Chunking: Intelligent text segmentation
Metadata Tracking: File name, MIME type, source tracking

3. From Chat History

Convert conversations into queryable knowledge:

Export chat history as RAG
Enable AI to reference past conversations
Preserve conversation context across sessions

4. From Neuron Packets

Import pre-built RAG files:

.neuron packet format
Encrypted RAG sharing
Version control and metadata

5. Secure RAG Creation

Enterprise-grade encrypted knowledge bases:

Admin Password Protection: Master password for RAG access
Read-Only Users: Grant limited access without admin privileges
Hardware-Backed Encryption: AES-256-GCM with Android KeyStore
User Management: Add/remove read-only users
Access Control: Fine-grained permission system

RAG Features

Query System:

Semantic Search: Embedding-based similarity search (cosine similarity)
Top-K Results: Return most relevant chunks
Context Injection: Automatically augment prompts with relevant knowledge
Multi-RAG Support: Query across multiple loaded RAGs simultaneously

RAG Management:

Enable/Disable: Control which RAGs are active for queries
Lazy Loading: Load RAGs into memory on demand
Status Tracking: INSTALLED, LOADED, LOADING, ERROR states
Metadata: Domain, language, version, tags, embedding model info
Size Management: Track RAG file size, compression ratio
Delete/Export: Remove or share RAG files

Loading Modes:

Embedded: RAG stored within app data (persistent)
Transient: Temporary loading from external files

NeuronGraph Integration:

Node-based knowledge representation
Graph traversal for related concepts
Serialization/deserialization support

Embedding Engine

Model: all-MiniLM-L6-v2-Q5_K_M (768-dimensional embeddings)
Auto-Download: Fetches embedding model from HuggingFace on first use
Batch Processing: Efficient batch embedding generation
Normalization: Optional L2 normalization for cosine similarity

RAG UI Features

RAG Overlay: Transparent overlay shows retrieved context during chat
RAG Data Explorer: Browse all chunks, edit metadata, view embeddings
RAG Statistics: Size, chunk count, embedding coverage
Search & Filter: Full-text search within RAGs
Category & Tag Management: Organize RAG content

Memory Vault (Secure Storage)

The Memory Vault is Tool-Neuron's sophisticated encrypted storage system, providing crash-recoverable, compressed, deduplicated storage with enterprise-grade security.

Core Architecture

Hardware-Backed Encryption:

Algorithm: AES-256-GCM with 96-bit IV
Key Storage: Android KeyStore (hardware-backed on supported devices)
Key Migration: Automatic re-encryption on key rotation
Auth-Tagged: GCM mode provides authentication and integrity

Write-Ahead Logging (WAL):

Crash Recovery: Automatic recovery from crashes/power loss
Transaction Safety: ACID-compliant operations
Checkpoint System: Periodic index checkpointing
Rollback Support: Restore from checkpoints on corruption

LZ4 Compression:

Fast Compression: Real-time compression/decompression
Ratio Tracking: Monitor compression efficiency
Block-Level: Compress individual blocks for efficient I/O
Configurable: Adjust compression level

Content Deduplication:

SHA-256 Hashing: Identify duplicate content
Reference Counting: Track shared content usage
Automatic Cleanup: Remove unreferenced blocks
Storage Efficiency: Reduce redundant encrypted data

Data Types

Messages

Full-text indexed conversation messages
Tokenization for search
Timestamp tracking
Category and tag support

Files

Binary file storage with MIME type tracking
Image, document, and arbitrary file support
Metadata preservation
Content deduplication

Embeddings

768-dimensional vector storage
Semantic search with cosine similarity
Batch embedding support
Normalization options

Custom Data

JSON-serialized custom structures
Schema-flexible storage
Queryable metadata

Caching System

Three-Tier Architecture:

L1 Hot Cache: In-memory cache for frequently accessed items (< 1MB)
L2 Memory-Mapped: Memory-mapped file access for warm data (< 5MB)
L3 On-Demand: Disk-based access for cold data

Cache Metrics:

Hit/miss rates
Eviction tracking
Memory usage monitoring
Performance optimization

Storage Operations

Search Capabilities:

Full-Text Search: Tokenized text search across messages
Semantic Search: Embedding-based similarity search
Category Filter: Filter by predefined categories
Tag Filter: Multi-tag filtering support
Time Range: Search within date/time ranges
Content Type Filter: Filter by data type

Maintenance:

Defragmentation: Reclaim wasted space from deleted items
Index Rebuilding: Reconstruct search indices
Validation: Integrity checking and corruption detection
Backup: Export vault with compression
Restore: Import from backup files

Vault Statistics

Monitor storage health:

Total Items: Count by type (messages, files, embeddings, custom)
Size Metrics: Compressed vs uncompressed sizes
Compression Ratio: Efficiency tracking
Wasted Space: Identify fragmentation
Time Range: Earliest and latest item timestamps
Content Type Breakdown: Distribution across data types

Vault UI Features

Vault Dashboard: Overview of all vault contents
Statistics Screen: Detailed metrics and graphs
Data Explorer: Browse, search, filter all items
Metadata Editor: Edit categories, tags, search text
User Management: Manage vault access credentials (admin, read-only)
Logger Screen: Debug logs with operation timing and encryption metrics

Document Processing

Comprehensive document parsing with format detection and content extraction.

Supported Formats

PDF

Engine: PDFBox-Android
Capabilities: Text extraction, metadata parsing
Streaming: Efficient I/O for large files

EPUB

Engine: EpubLib
Capabilities: E-book text extraction, chapter navigation
HTML Cleanup: Automatic tag removal for plain text

Microsoft Word

Formats: .docx (Office Open XML), .doc (binary)
Engine: Apache POI
Capabilities:
- Paragraph extraction
- Table parsing with structure preservation
- Formatting metadata

Microsoft Excel

Formats: .xlsx (Office Open XML), .xls (binary)
Engine: Apache POI
Capabilities:
- Multi-sheet support with sheet names
- Cell type detection (string, numeric, boolean, formula)
- Formula evaluation
- Comprehensive cell formatting

Plain Text

Encodings: UTF-8, UTF-16, ASCII
Line Ending Support: Unix, Windows, Mac

Processing Features

MIME Type Detection: Automatic format detection with fallback to file extension
Error Handling: Informative error messages on parse failures
Progress Tracking: Real-time parsing progress for large documents
Metadata Extraction: Title, author, creation date, modification date
Logging: Comprehensive debug logging for troubleshooting

Model Management

Model Store

In-App HuggingFace Integration:

Browse HuggingFace model repositories
Search with filters (model type, size, tags)
Add custom repositories by username/org
View model metadata (size, quantization, tags, downloads)

Download Management:

Concurrent Downloads: Download multiple models simultaneously
Progress Tracking: Real-time progress notifications
WorkManager Integration: Robust background task management
Resume Capability: Resume interrupted downloads
Foreground Service: Persistent downloads (Android 14+ compliant)

Model Categories:

General: General-purpose conversational models
Medical: Healthcare and medical domain models
Research: Academic and research-focused models
Coding: Programming and code generation models
Uncensored: Unfiltered and uncensored models
Business: Professional and business domain models
Cybersecurity: Security and penetration testing models

Model Configuration

Per-Model Settings:

Loading Parameters:

Thread count (auto-detect or manual)
Context size (512 to 32768+ tokens)
Quantization options
GPU layers (if supported)

Inference Parameters:

Temperature (0.0 - 2.0)
Top-k sampling
Top-p (nucleus) sampling
Min-p sampling
Repeat penalty
Frequency/presence penalty
System prompt
Seed (for reproducibility)

Configuration Storage:

Database-backed persistence
JSON serialization
Import/export configurations
Default configurations per model category

Model Picker

Grid/list view of installed models
Search and filter
Model details (size, format, loaded status)
Quick load/unload
Delete models

Privacy & Security

Data Collection

ZERO DATA COLLECTION. No telemetry, analytics, crash reporting, or tracking of any kind.

What Stays Local

All conversations and chat history
Generated images and files
Speech synthesis and TTS audio
Plugin execution results
Model configurations
User preferences
RAG knowledge bases
Memory vault contents
Document parsing results

Encryption Details

Storage Encryption:

Algorithm: AES-256-GCM
IV: 96-bit unique per-block
Key Derivation: Android KeyStore hardware-backed
Authentication: GCM auth tag for integrity verification

Vault Security:

Write-Ahead Logging for crash recovery
Content deduplication prevents re-encryption overhead
Secure key migration on rotation
Automatic encryption of all stored data

RAG Security:

Optional encryption for RAG packets
Admin password protection
Read-only user access control
Hardware-backed key storage

No Permissions Required

Storage Access Framework (SAF): Load models via file picker
Scoped Storage: Modern Android storage compliance
No Broad Access: App cannot access arbitrary files

Verification

Fully open source. Audit the code or review community security assessments.

Installation

System Requirements

Minimum (Text Only):

Android 12+ (API 31)
6GB RAM
4GB free storage
ARM64 or x86_64 processor

Recommended (Text + Image + TTS + RAG):

Android 13+
8GB RAM (12GB preferred)
10GB free storage
Snapdragon 8 Gen 1 or equivalent
Hardware-backed encryption support

Download

Google Play Store (Recommended): Get it on Play Store

Direct APK: Download from GitHub Releases

Quick Start

1. Get Models

Option A: In-App Model Store (Recommended)

Open ToolNeuron
Navigate to Model Store (drawer menu)
Add HuggingFace repository:
- Example: QuantFactory/Meta-Llama-3-8B-GGUF
- Example: bartowski/Phi-3.5-mini-instruct-GGUF
Browse models and tap to download
Return to home screen and select your model

Option B: Manual Download

Visit Hugging Face GGUF Models
Download a model file (e.g., Llama-3-8B-Q4_K_M.gguf)
Open ToolNeuron
Use model picker to load from file
Grant file access via Android file picker

Recommended Models:

Use Case	Model	Size	Description
Budget/Testing	TinyLlama-1.1B-Q4_K_M	669MB	Fast, low resource
Balanced	Llama-3-8B-Q4_K_M	4.5GB	Best quality/performance
Maximum Quality	Mistral-7B-Q6_K	6GB	Highest quality 7B
Coding	DeepSeek-Coder-6.7B-Q4	4GB	Code generation
Medical	Bio-Medical-Llama-3-8B	4.5GB	Healthcare domain

2. Generate Text

Launch ToolNeuron
Select or import GGUF model (wait for loading progress)
Model loads automatically (status bar shows progress)
Start typing your prompt
AI streams response in real-time

Pro Tips:

Adjust temperature in model config (0.7 = balanced, 0.3 = focused, 1.0 = creative)
Increase context size for longer conversations (but uses more RAM)
Use system prompts to set AI behavior

3. Generate Images

Download Stable Diffusion 1.5 model from HuggingFace:
- Search for "stable-diffusion-v1-5"
- Download .safetensors or .ckpt file
Import into ToolNeuron via model picker
Switch to Image generation mode (toggle in chat screen)
Enter your prompt (e.g., "a serene mountain landscape at sunset, 4k, photorealistic")
Optional: Add negative prompt (e.g., "blurry, low quality, distorted")
Tap generate and wait 30-90 seconds
Image appears in chat with save option

4. Create RAG Knowledge Base

From Documents

Navigate to RAG menu (drawer)
Tap "Create New RAG"
Select "From File"
Choose document type (PDF, Word, Excel, EPUB, TXT)
Pick file via file picker
Set RAG name and metadata (optional: enable encryption)
Wait for document parsing and embedding
RAG appears in RAG list

From Text

Tap "Create New RAG" - "From Text"
Paste or type your content
Set metadata (name, category, tags)
Tap "Create"
System chunks and embeds automatically

Using RAGs in Chat

Enable desired RAGs in RAG management screen
Return to chat
RAG overlay button appears (tap to view retrieved context)
AI automatically uses relevant RAG content in responses
Retrieved chunks show in overlay for transparency

Technical Details

Architecture

Core Stack:

Language: Kotlin (Android), C++ (inference engines via JNI)
UI Framework: Jetpack Compose (declarative UI)
Text Inference: llama.cpp (GGUF engine)
Image Inference: LocalDream (Stable Diffusion 1.5)
Database: Room (SQLite) with AES-256-GCM encryption
Async: Kotlin Coroutines + Flow
Dependency Injection: Dagger Hilt
Navigation: Jetpack Navigation Compose
Serialization: Kotlinx Serialization + Gson

Custom Modules:

memory-vault: Encrypted storage with WAL and compression
neuron-packet: RAG packet format with encryption and access control
ai_gguf-release.aar: Native GGUF inference library
ai_sd-release.aar: Native Stable Diffusion library
ai_supertonic_tts: On-device TTS with ONNX Runtime

Inference Engines

GGUF Engine (GGUFEngine.kt):

Native JNI bindings to llama.cpp
Loading: File path or file descriptor (SAF/content:// URIs)
Streaming: Token-by-token generation with Flow
Callbacks: onToken, onToolCall, onDone, onError, onMetrics
Device Detection: Auto-detect device tier (LOW_END, MID_RANGE, HIGH_END)
Optimization: Automatic thread/context recommendations

Diffusion Engine (DiffusionEngine.kt):

Integration with StableDiffusionManager (LocalDream)
NPU/CPU backend support
Text embedding size configuration
Pony model support
Intermediate result streaming
Safety checker toggle

Embedding Engine (EmbeddingEngine.kt):

Model: all-MiniLM-L6-v2-Q5_K_M
Dimensions: 768
Operations: Single/batch embedding, normalization
Auto-download on first use

TTS Engine (TTSManager.kt):

Supertonic TTS with ONNX Runtime Android
10 voices (F1-F5, M1-M5), 5 languages
Configurable denoising steps (1-8) and speed (0.5x-2.0x)
Optional NNAPI hardware acceleration
StateFlow-based reactive state management

Plugin Engine (PluginManager.kt):

Plugin registration with tool schemas for LLM integration
Grammar modes: LAZY (flexible) and STRICT (enforced JSON schema)
Execution metrics tracking (timing, success/failure)
Compose UI rendering per plugin result
Enable/disable per plugin at runtime

Storage System

Memory Vault:

Block-based storage with headers
Three-tier caching (L1 hot, L2 memory-mapped, L3 on-demand)
Write-Ahead Logging for crash recovery
LZ4 compression with ratio tracking
SHA-256 content deduplication
Full-text and semantic search indices

Database Schema (Room v6, 5 entities):

Models table (GGUF/SD metadata)
ModelConfig table (loading + inference params)
InstalledRAGs table (RAG metadata with status)
Personas table (character cards with TavernAI v2 fields)
AiMemories table (extracted user facts with categories, embeddings, forgetting curve)
DataStore (preferences -- settings, active persona, TTS config)

Performance Benchmarks

Text Generation (8B Q4_K_M on flagship device):

Model Load Time: 5-15 seconds
First Token: 1-3 seconds
Generation Speed: 8-15 tokens/sec
Context Processing: 500+ tokens/sec

Image Generation (SD 1.5):

Mid-Range (SD 8 Gen 1): 60-90 seconds
Flagship (SD 8 Gen 3): 30-50 seconds
Resolution: 512x512 (fastest), 1024x1024 (slower)

RAG Query:

Single RAG (1000 chunks): < 100ms
Multiple RAGs (5000+ chunks): 100-500ms
Embedding generation: 50-200ms per chunk

Memory Usage:

Idle: 200-500MB
8B Model Loaded: 5-6GB
With RAGs: +100-500MB
Vault Cache: 3-5MB

Use Cases

Privacy-Critical Applications

Medical Professionals: HIPAA-compliant patient data handling
Legal Professionals: Confidential document analysis
Journalists: Protecting source anonymity
Therapists: Private session notes and analysis
Researchers: Sensitive data processing
Anyone Valuing Digital Sovereignty

Offline Scenarios

Air travel (no WiFi required)
Remote locations (rural, wilderness)
Areas with unreliable internet
Avoiding mobile data costs
Military/government secure environments

Document Intelligence

Knowledge Base Creation: Convert company docs, manuals, research papers into queryable RAGs
Study Aid: Embed textbooks and lecture notes for AI-assisted learning
Research: Query across multiple PDFs and papers simultaneously
Legal Document Review: Search case law and contracts with semantic understanding
Medical Reference: Embed medical literature for clinical decision support

Creative & Development

Writing and brainstorming assistance
Code generation and debugging
Image generation for content creation
Learning and research
Creative storytelling with AI

Building from Source

Prerequisites

Android Studio Ladybug (2024.2.1) or newer
JDK 17
Android SDK 36+
Android NDK 26.x
Git

Steps

# Clone repository git clone https://github.com/Siddhesh2377/ToolNeuron.git cd ToolNeuron # Open in Android Studio # File - Open - Select ToolNeuron folder # Sync Gradle dependencies (Android Studio will prompt) # Create local.properties (optional, for signing) # Add: ALIAS="your_keystore_alias" # Build release APK ./gradlew assembleRelease # Install on connected device ./gradlew installRelease # Or build debug version ./gradlew assembleDebug ./gradlew installDebug

Build Output

Release APK: app/build/outputs/apk/release/app-release.apk
Debug APK: app/build/outputs/apk/debug/app-debug.apk

Troubleshooting

NDK Issues: Ensure NDK 26.x is installed via SDK Manager
JNI Build Failures: Check local.properties has correct NDK path
Memory Issues: Increase Gradle heap size in gradle.properties
org.gradle.jvmargs=-Xmx4096m

Roadmap

Version 2.0.0 (Current - February 2026)

AI Personas & Character Cards:

TavernAI v2 compatible character card system
Full character editor (description, personality, scenario, example messages, alternate greetings, tags)
Avatar image support (local storage, Coil 3)
Template variables ({{char}}, {{user}})
Identity framing and post-history reinforcement for small models
TavernAI v2 JSON import/export
Default personas (Assistant, Luna, CodeBuddy, Sage)

AI Memory System:

Persistent cross-conversation memory (Mem0-inspired)
Automatic fact extraction via LLM
AUDN deduplication (Jaccard similarity)
Persona-aware extraction (filters out roleplay traits)
Memory categories (Personal, Preference, Work, Interest, General)
Forgetting curve with access-based reinforcement
Memory browser screen (search, filter, edit, delete)

Enhanced Chat Experience:

Thinking mode toggle for supported models
Regenerate last message
Code syntax highlighting (toggleable)
Improved markdown rendering with LazyList integration
Character card quick-access in bottom bar

RAG Pipeline Improvements:

Hybrid retrieval: FTS4 BM25 + vector search + RRF + MMR
Confidence gating for low-quality results
RAG enable/disable toggle per chat session
Improved RAG management UX

Model Store Updates:

Uncensored model category with curated repos
LiquidAI LFM2-350M added
Qwen3 8B added

Stability & Fixes:

UTF-8 stream decoding fix (smart quotes, accented characters)
Surrogate pair handling for emoji above U+FFFF
Database migration v4 - v5 - v6
Improved UI across all screens (Model Store, RAG Creation, Settings, Home)

Previous Releases

Version 1.2.0 (January 2026):

Text generation with any GGUF model
Image generation with SD 1.5
HuggingFace repository integration
Encrypted Memory Vault with WAL
RAG system with document injection
Secure RAG creation with encryption
Document processing (PDF, Word, Excel, EPUB)
Model configuration editor
Concurrent model downloads
Function calling support
Inpainting support
Text-to-Speech (10 voices, 5 languages, NNAPI, auto-speak)
Plugin system with UI (web search, calculator, dev utils)
Settings screen with persistent preferences

Upcoming

Version 2.1 (Q2 2026):

Speech-to-Text (STT) support
Multi-modal support (vision models like LLaVA, BakLLaVA)
Code execution plugin with sandboxing
Advanced memory clustering and insights
Conversation summarization

Version 2.2 (Q3 2026):

Additional model formats (ONNX, TFLite, CoreML)
Desktop companion app (Windows, macOS, Linux)
Cloud sync with end-to-end encryption (optional)
Plugin marketplace
Advanced RAG features (graph-based reasoning)

Comparison

Feature	ToolNeuron	Cloud AI Apps	Other Local AI Apps
Text Generation	Any GGUF model	Cloud only	Limited models
Image Generation	SD 1.5 offline	Cloud only	Rare
Text-to-Speech	On-device, 10 voices	Cloud-based	Rare
Character Cards	TavernAI v2 compatible	N/A	SillyTavern only
AI Memory	Persistent cross-session	Some cloud apps	None
Plugin System	7 built-in plugins	Cloud-based	None
RAG System	Hybrid BM25+Vector+RRF	Cloud-based	Basic or none
Document Processing	PDF/Word/Excel/EPUB	Cloud upload	Limited
Privacy	Complete offline	Server logging	Varies
Encryption	AES-256-GCM + WAL	N/A	Rare
Cost	Free (one-time)	$20-50+/month	Varies
Internet Required	No (after models)	Yes	Varies
Open Source	Apache 2.0	Proprietary	Varies
Storage Permissions	Not needed (SAF)	N/A	Usually needed
Function Calling	Grammar-constrained multi-turn	Yes	Rare
Model Store	In-app HF browser	N/A	Manual download

Community Testimonials

"The only LLM frontend capable of running 8B Q6 models on my hardware with lightspeed loading. I'm in military healthcare and privacy is critical. ToolNeuron is the only app that meets my requirements." -- Senior Healthcare Professional, Netherlands

"I use ToolNeuron for legal document analysis. The RAG system with encrypted storage gives me confidence that client data stays confidential. No other app comes close." -- Attorney, United States

"As a journalist, I can't risk my sources being exposed through cloud AI services. ToolNeuron's offline-first approach is exactly what I needed." -- Investigative Journalist, Germany

FAQ

General

Q: Does this really work completely offline? A: Yes. After downloading models and the embedding model, all AI processing (text generation, image generation, RAG queries, document parsing) happens entirely on your device with zero internet dependency.

Q: How much storage do I need? A: Minimum 4GB for a single 7B model. Recommended 10GB for multiple models, SD 1.5, and RAGs. Large setups with many models can use 20GB+.

Q: Will this drain my battery? A: Local AI is computationally intensive. During active generation, battery drain is significant. Keep your device charged during extended use. Idle usage is minimal.

Q: Is my data actually private? A: Yes. Nothing leaves your device. All processing is local. The code is open source - you can verify yourself or review community audits.

Q: Can I use custom models? A: Yes. Any GGUF text model works. For image generation, Stable Diffusion 1.5 checkpoints are supported (.safetensors or .ckpt).

Technical

Q: Why don't you need storage permissions? A: Android's Storage Access Framework (SAF) allows file picker access without broad storage permissions. Users explicitly select files, granting app access only to chosen files.

Q: Why is image generation slow? A: Stable Diffusion 1.5 requires 20-50 inference steps, each computationally expensive. Mobile hardware is slower than desktop GPUs. 30-90 seconds is normal and optimized.

Q: Can I run 13B or 70B models? A: Depends on device RAM. 13B Q4 needs ~10GB RAM (requires 12GB device). 70B models are impractical on current mobile hardware (need 40GB+ RAM).

Q: What quantization should I use? A: For 8B models:

Q4_K_M: Best balance (4.5GB, good quality)
Q5_K_S: Higher quality (5GB)
Q6_K: Maximum quality (6GB, slower)
Q2_K: Ultra-compressed (2.5GB, lower quality)

Q: Does RAG work without internet? A: Yes. RAG embedding and querying are completely offline after initial embedding model download (~100MB).

Q: How secure is the encryption? A: AES-256-GCM with hardware-backed keys (Android KeyStore) is military-grade encryption. On supported devices, keys are stored in Trusted Execution Environment (TEE) or Secure Element (SE), making extraction extremely difficult.

Q: Can I share encrypted RAGs? A: Yes. Export RAG as .neuron packet with encryption enabled. Share the file and password separately. Recipients can import and decrypt with the password.

Q: How does Text-to-Speech work? A: TTS uses the Supertonic TTS engine running entirely on-device via ONNX Runtime. Download the TTS model from the Model Store or directly from the Settings screen under the TTS section, then tap the speak button on any assistant message. You can also enable auto-speak in Settings to have responses read aloud automatically.

Q: Can I use TTS offline? A: Yes. After downloading the TTS model (~100MB), all speech synthesis happens locally with no internet required.

Q: What do "denoising steps" control in TTS? A: Higher steps produce clearer, more natural speech but take longer to synthesize. The default of 2 steps provides a good balance. Increase to 4-8 for maximum quality.

Troubleshooting

Q: App crashes on model load? A: Likely out of memory. Try:

Close other apps
Use smaller model (Q4 instead of Q6, or 1B/3B instead of 7B/8B)
Reduce context size in model config
Restart device to free RAM

Q: Image generation fails or crashes? A: SD 1.5 requires significant RAM. Ensure:

Device has 8GB+ RAM
No other heavy apps running
Try lower resolution (512x512)
Restart app and try again

Q: Models download but won't load? A: Check:

File is complete (compare size to HuggingFace listing)
File is valid GGUF format
Model isn't corrupted (re-download if suspicious)
Sufficient RAM available

Q: RAG queries return no results? A: Verify:

RAG is enabled in RAG management screen
RAG loaded successfully (check status)
Embedding model downloaded
Query is semantically related to RAG content

Contributing

Contributions are welcome! Focus areas:

Priority Areas

Bug fixes and stability improvements
Performance optimizations (inference speed, memory usage)
Device compatibility testing (especially mid-range devices)
Documentation improvements and translations
UI/UX enhancements (accessibility, dark theme refinements)

Development Areas

Speech-to-Text (STT) integration
Multi-modal model support
Additional plugins and plugin marketplace
Additional document format support
Advanced RAG features

Process

Fork the repository
Create feature branch: git checkout -b feature/your-feature-name
Make changes with clear, focused commits
Test on real devices (emulators don't reflect real performance)
Write clear commit messages explaining "why" not just "what"
Submit Pull Request with description of changes and testing done

Code Style

Follow Kotlin coding conventions
Use meaningful variable/function names
Comment complex logic
Prefer immutability where possible
Use Jetpack Compose best practices for UI

Testing

Test on multiple devices (low-end, mid-range, flagship)
Verify memory usage doesn't regress
Test offline functionality
Check encryption/decryption operations
Validate UI on different screen sizes

License

Apache License 2.0

See LICENSE for full details.

What This Means

Commercial use permitted
Modification permitted
Distribution permitted
Patent use permitted
Private use permitted
Trademark use not permitted
Liability and warranty disclaimer

Use ToolNeuron in commercial products, modify it, distribute it, all without restrictions. Attribution appreciated but not required.

Acknowledgments

ToolNeuron stands on the shoulders of giants:

Core Technologies

llama.cpp - Efficient LLM inference by Georgi Gerganov
LocalDream - Stable Diffusion on Android
Jetpack Compose - Modern Android UI framework

Libraries

ONNX Runtime - TTS model inference
Coil 3 - Image loading for avatar images
Jsoup - HTML parsing for web search plugin
Apache POI - Microsoft document parsing
PDFBox-Android - PDF processing
EpubLib - EPUB support
Room - Database abstraction
Dagger Hilt - Dependency injection
OkHttp - HTTP client
Retrofit - Type-safe HTTP client

Inspiration

SillyTavern - Character card format and post-history instruction techniques
Mem0 - AI memory architecture (AUDN cycle, extraction patterns)
Privacy-first movement
Open source community
User feedback and feature requests

Support

Get Help

Discord Community: Join Server - Active community for questions, tips, and discussions
GitHub Issues: Report Bug/Request Feature
Email: siddheshsonar2377@gmail.com

Support Development

Star this repository if you find it useful
Report bugs to help improve stability
Suggest features to guide development
Improve documentation for better onboarding
Contribute code to add features
Help others in Discord community

Security

Reporting Vulnerabilities

If you discover a security vulnerability, please:

Do NOT open a public GitHub issue
Email siddheshsonar2377@gmail.com with details
Include steps to reproduce
Allow reasonable time for fix before public disclosure

We take security seriously and will respond promptly.

Built with by Siddhesh Sonar

Privacy-first AI for everyone. Own your data. Own your AI.

Star this repository * Report Bug * Request Feature * Join Discord

Quick Links

Installation * Quick Start * Features * RAG System * Memory Vault * FAQ

License: Apache 2.0 * Version: 2.0.0 * Platform: Android 12+

Uh oh!

License

Siddhesh2377/ToolNeuron

Folders and files

Latest commit

History

Repository files navigation

ToolNeuron

Privacy-First AI Assistant for Android - Complete On-Device Intelligence

Why ToolNeuron?

Table of Contents

Features Overview

Core Capabilities

AI Personas & Character Cards

Character Card Fields

Key Features

How It Works

AI Memory System

How It Works

Memory Features

Text Generation

Model Support

Performance

Advanced Features

Supported Callbacks

Image Generation

Stable Diffusion 1.5

Capabilities

Advanced Features

Text-to-Speech (TTS)

Voice Options

Features

Settings

Plugin System

Built-in Plugins (7)

Web Search

File Manager

Calculator

Clipboard

Date & Time

Device Info

Dev Utils

Plugin Architecture

RAG System (Document Intelligence)

RAG Creation Methods

1. From Text

2. From Files

3. From Chat History

4. From Neuron Packets

5. Secure RAG Creation

RAG Features

Embedding Engine

RAG UI Features

Memory Vault (Secure Storage)

Core Architecture

Data Types

Messages

Files

Embeddings

Custom Data

Caching System

Storage Operations

Vault Statistics

Vault UI Features

Document Processing

Supported Formats

PDF

EPUB

Microsoft Word

Microsoft Excel

Plain Text

Processing Features

Model Management

Model Store

Model Configuration

Model Picker

Privacy & Security

Data Collection

What Stays Local

Encryption Details