Name	Name	Last commit message	Last commit date
Latest commit History 995 Commits
.github	.github
docs	docs
examples	examples
legacy	legacy
openadapt	openadapt
.gitignore	.gitignore
CHANGELOG.md	CHANGELOG.md
CLAUDE.md	CLAUDE.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
README.md	README.md
mkdocs.yml	mkdocs.yml
pyproject.toml	pyproject.toml

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai

Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package	Description	Repository
`openadapt`	Meta-package with unified CLI	This repo
`openadapt-capture`	Event recording and storage	openadapt-capture
`openadapt-ml`	ML engine, training, inference	openadapt-ml
`openadapt-evals`	Benchmark evaluation	openadapt-evals
`openadapt-viewer`	HTML visualization	openadapt-viewer
`openadapt-grounding`	UI element localization	openadapt-grounding
`openadapt-retrieval`	Multimodal demo retrieval	openadapt-retrieval
`openadapt-privacy`	PII/PHI scrubbing	openadapt-privacy

Installation

Install what you need:

pip install openadapt # Minimal CLI only pip install openadapt[capture] # GUI capture/recording pip install openadapt[ml] # ML training and inference pip install openadapt[evals] # Benchmark evaluation pip install openadapt[privacy] # PII/PHI scrubbing pip install openadapt[all] # Everything

Requirements: Python 3.10+

Quick Start

1. Record a demonstration

openadapt capture start --name my-task # Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

CLI Reference

openadapt capture start --name Start recording openadapt capture stop Stop recording openadapt capture list List captures openadapt capture view Open capture viewer openadapt train start --capture Train model on capture openadapt train status Check training progress openadapt train stop Stop training openadapt eval run --checkpoint Evaluate trained model openadapt eval run --agent api-claude Evaluate API agent openadapt eval mock --tasks 10 Run mock evaluation openadapt serve --port 8080 Start dashboard server openadapt version Show installed versions openadapt doctor Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

flowchart TB %% ----------------------------------------------------------------------- %% DATA SOURCES (Multi-Source Ingestion) %% ----------------------------------------------------------------------- subgraph DataSources["Data Sources"] direction LR HUMAN["Human Demos"] SYNTH["Synthetic Data"]:::future BENCH_DATA["Benchmark Tasks"] end %% ----------------------------------------------------------------------- %% PHASE 1: DEMONSTRATE (Observation Collection) %% ----------------------------------------------------------------------- subgraph Demonstrate["1. DEMONSTRATE (Observation Collection)"] direction TB CAP["Capture openadapt-capture"] PRIV["Privacy openadapt-privacy"] STORE[("Demo Library")] CAP --> PRIV PRIV --> STORE end %% ----------------------------------------------------------------------- %% PHASE 2: LEARN (Policy Acquisition) %% ----------------------------------------------------------------------- subgraph Learn["2. LEARN (Policy Acquisition)"] direction TB subgraph RetrievalPath["Retrieval Path"] EMB["Embed"] IDX["Index"] SEARCH["Search"] EMB --> IDX --> SEARCH end subgraph TrainingPath["Training Path"] LOADER["Load"] TRAIN["Train"] CKPT[("Checkpoint")] LOADER --> TRAIN --> CKPT end subgraph ProcessMining["Process Mining"] ABSTRACT["Abstract"]:::future PATTERNS["Patterns"]:::future ABSTRACT --> PATTERNS end end %% ----------------------------------------------------------------------- %% PHASE 3: EXECUTE (Agent Deployment) %% ----------------------------------------------------------------------- subgraph Execute["3. EXECUTE (Agent Deployment)"] direction TB subgraph AgentCore["Agent Core"] OBS["Observe"] POLICY["Policy (Demo-Conditioned)"] GROUND["Grounding openadapt-grounding"] ACT["Act"] OBS --> POLICY POLICY --> GROUND GROUND --> ACT end subgraph SafetyGate["Safety Gate"] VALIDATE["Validate"] CONFIRM["Confirm"]:::future VALIDATE --> CONFIRM end subgraph Evaluation["Evaluation"] EVALS["Evals openadapt-evals"] METRICS["Metrics"] EVALS --> METRICS end ACT --> VALIDATE VALIDATE --> EVALS end %% ----------------------------------------------------------------------- %% THE ABSTRACTION LADDER (Side Panel) %% ----------------------------------------------------------------------- subgraph AbstractionLadder["Abstraction Ladder"] direction TB L0["Literal (Raw Events)"] L1["Symbolic (Semantic Actions)"] L2["Template (Parameterized)"] L3["Semantic (Intent)"]:::future L4["Goal (Task Spec)"]:::future L0 --> L1 L1 --> L2 L2 -.-> L3 L3 -.-> L4 end %% ----------------------------------------------------------------------- %% MODEL LAYER %% ----------------------------------------------------------------------- subgraph Models["Model Layer (VLMs)"] direction TB subgraph APIModels["API Models"] direction LR CLAUDE["Claude"] GPT["GPT-4o"] GEMINI["Gemini"] end subgraph OpenSource["Open Source / Fine-tuned"] direction LR QWEN3["Qwen3-VL"] UITARS["UI-TARS"] OPENCUA["OpenCUA"] end end %% ----------------------------------------------------------------------- %% MAIN DATA FLOW %% ----------------------------------------------------------------------- %% Data sources feed into phases HUMAN --> CAP SYNTH -.-> LOADER BENCH_DATA --> EVALS %% Demo library feeds learning STORE --> EMB STORE --> LOADER STORE -.-> ABSTRACT %% Learning outputs feed execution SEARCH -->|"demo context"| POLICY CKPT -->|"trained policy"| POLICY PATTERNS -.->|"templates"| POLICY %% Model connections POLICY --> Models GROUND --> Models %% ----------------------------------------------------------------------- %% FEEDBACK LOOPS (Evaluation-Driven) %% ----------------------------------------------------------------------- METRICS -->|"success traces"| STORE METRICS -.->|"training signal"| TRAIN %% Retrieval in BOTH training AND evaluation SEARCH -->|"eval conditioning"| EVALS %% ----------------------------------------------------------------------- %% STYLING %% ----------------------------------------------------------------------- %% Phase colors classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff %% Component states classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5 classDef futureBlock fill:#f5f5f5,stroke:#95A5A6,stroke-dasharray: 5 5 classDef safetyBlock fill:#E74C3C,stroke:#A93226,color:#fff %% Model layer classDef models fill:#F39C12,stroke:#B7950B,color:#fff %% Apply styles class CAP,PRIV,STORE phase1 class EMB,IDX,SEARCH,LOADER,TRAIN,CKPT phase2 class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3 class CLAUDE,GPT,GEMINI,QWEN models class L0,L1,L2 implemented

Core Approach: Demo-Conditioned Prompting

OpenAdapt explores demonstration-conditioned automation - "show, don't tell":

Traditional Agent	OpenAdapt Agent
User writes prompts	User records demonstration
Ambiguous instructions	Grounded in actual UI
Requires prompt engineering	Reduced prompt engineering
Context-free	Context from similar demos

Retrieval powers BOTH training AND evaluation: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the publication roadmap for methodology and limitations.

Key Concepts

Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
Evaluation-Driven Feedback: Success traces become new training data

Legend: Solid = Implemented | Dashed = Future

Terminology

Term	Description
Observation	What the agent perceives (screenshot, accessibility tree)
Action	What the agent does (click, type, scroll, etc.)
Trajectory	Sequence of observation-action pairs
Demonstration	Human-provided example trajectory
Policy	Decision-making component that maps observations to actions
Grounding	Mapping intent to specific UI elements (coordinates)

Demos

Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.

Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.

Contributing

Join Discord
Pick an issue from the relevant sub-package repository
Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml # or other sub-package cd openadapt-ml pip install -e ".[dev]"

Related Projects

OpenAdaptAI/SoM - Set-of-Mark prompting
OpenAdaptAI/pynput - Input monitoring fork
OpenAdaptAI/atomacos - macOS accessibility

Support

Discord: https://discord.gg/yF527cQbDG
Issues: Use the relevant sub-package repository
Architecture docs: GitHub Wiki

License

MIT License - see LICENSE for details.

Folders and files

Latest commit

History

Repository files navigation

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

Architecture

Installation

Quick Start

1. Record a demonstration

2. Train a model

3. Evaluate

4. View recordings

CLI Reference

How It Works

Three-Phase Pipeline

Core Approach: Demo-Conditioned Prompting

Key Concepts

Terminology

Demos

Permissions

Legacy Version

Contributing

Related Projects

Support

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages