Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

pinkycollie/OpenAdapt

Repository files navigation

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai


Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package Description Repository
openadapt Meta-package with unified CLI This repo
openadapt-capture Event recording and storage openadapt-capture
openadapt-ml ML engine, training, inference openadapt-ml
openadapt-evals Benchmark evaluation openadapt-evals
openadapt-viewer HTML visualization openadapt-viewer
openadapt-grounding UI element localization openadapt-grounding
openadapt-retrieval Multimodal demo retrieval openadapt-retrieval
openadapt-privacy PII/PHI scrubbing openadapt-privacy

Installation

Install what you need:

pip install openadapt # Minimal CLI only
pip install openadapt[capture] # GUI capture/recording
pip install openadapt[ml] # ML training and inference
pip install openadapt[evals] # Benchmark evaluation
pip install openadapt[privacy] # PII/PHI scrubbing
pip install openadapt[all] # Everything

Requirements: Python 3.10+


Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

CLI Reference

openadapt capture start --name Start recording
openadapt capture stop Stop recording
openadapt capture list List captures
openadapt capture view Open capture viewer

openadapt train start --capture Train model on capture
openadapt train status Check training progress
openadapt train stop Stop training

openadapt eval run --checkpoint Evaluate trained model
openadapt eval run --agent api-claude Evaluate API agent
openadapt eval mock --tasks 10 Run mock evaluation

openadapt serve --port 8080 Start dashboard server
openadapt version Show installed versions
openadapt doctor Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

flowchart TB
%% -----------------------------------------------------------------------
%% DATA SOURCES (Multi-Source Ingestion)
%% -----------------------------------------------------------------------
subgraph DataSources["Data Sources"]
direction LR
HUMAN["Human Demos"]
SYNTH["Synthetic Data"]:::future
BENCH_DATA["Benchmark Tasks"]
end

%% -----------------------------------------------------------------------
%% PHASE 1: DEMONSTRATE (Observation Collection)
%% -----------------------------------------------------------------------
subgraph Demonstrate["1. DEMONSTRATE (Observation Collection)"]
direction TB
CAP["Capture
openadapt-capture"]
PRIV["Privacy
openadapt-privacy"]
STORE[("Demo Library")]

CAP --> PRIV
PRIV --> STORE
end

%% -----------------------------------------------------------------------
%% PHASE 2: LEARN (Policy Acquisition)
%% -----------------------------------------------------------------------
subgraph Learn["2. LEARN (Policy Acquisition)"]
direction TB

subgraph RetrievalPath["Retrieval Path"]
EMB["Embed"]
IDX["Index"]
SEARCH["Search"]
EMB --> IDX --> SEARCH
end

subgraph TrainingPath["Training Path"]
LOADER["Load"]
TRAIN["Train"]
CKPT[("Checkpoint")]
LOADER --> TRAIN --> CKPT
end

subgraph ProcessMining["Process Mining"]
ABSTRACT["Abstract"]:::future
PATTERNS["Patterns"]:::future
ABSTRACT --> PATTERNS
end
end

%% -----------------------------------------------------------------------
%% PHASE 3: EXECUTE (Agent Deployment)
%% -----------------------------------------------------------------------
subgraph Execute["3. EXECUTE (Agent Deployment)"]
direction TB

subgraph AgentCore["Agent Core"]
OBS["Observe"]
POLICY["Policy
(Demo-Conditioned)"]
GROUND["Grounding
openadapt-grounding"]
ACT["Act"]

OBS --> POLICY
POLICY --> GROUND
GROUND --> ACT
end

subgraph SafetyGate["Safety Gate"]
VALIDATE["Validate"]
CONFIRM["Confirm"]:::future
VALIDATE --> CONFIRM
end

subgraph Evaluation["Evaluation"]
EVALS["Evals
openadapt-evals"]
METRICS["Metrics"]
EVALS --> METRICS
end

ACT --> VALIDATE
VALIDATE --> EVALS
end

%% -----------------------------------------------------------------------
%% THE ABSTRACTION LADDER (Side Panel)
%% -----------------------------------------------------------------------
subgraph AbstractionLadder["Abstraction Ladder"]
direction TB
L0["Literal
(Raw Events)"]
L1["Symbolic
(Semantic Actions)"]
L2["Template
(Parameterized)"]
L3["Semantic
(Intent)"]:::future
L4["Goal
(Task Spec)"]:::future

L0 --> L1
L1 --> L2
L2 -.-> L3
L3 -.-> L4
end

%% -----------------------------------------------------------------------
%% MODEL LAYER
%% -----------------------------------------------------------------------
subgraph Models["Model Layer (VLMs)"]
direction TB
subgraph APIModels["API Models"]
direction LR
CLAUDE["Claude"]
GPT["GPT-4o"]
GEMINI["Gemini"]
end
subgraph OpenSource["Open Source / Fine-tuned"]
direction LR
QWEN3["Qwen3-VL"]
UITARS["UI-TARS"]
OPENCUA["OpenCUA"]
end
end

%% -----------------------------------------------------------------------
%% MAIN DATA FLOW
%% -----------------------------------------------------------------------

%% Data sources feed into phases
HUMAN --> CAP
SYNTH -.-> LOADER
BENCH_DATA --> EVALS

%% Demo library feeds learning
STORE --> EMB
STORE --> LOADER
STORE -.-> ABSTRACT

%% Learning outputs feed execution
SEARCH -->|"demo context"| POLICY
CKPT -->|"trained policy"| POLICY
PATTERNS -.->|"templates"| POLICY

%% Model connections
POLICY --> Models
GROUND --> Models

%% -----------------------------------------------------------------------
%% FEEDBACK LOOPS (Evaluation-Driven)
%% -----------------------------------------------------------------------
METRICS -->|"success traces"| STORE
METRICS -.->|"training signal"| TRAIN

%% Retrieval in BOTH training AND evaluation
SEARCH -->|"eval conditioning"| EVALS

%% -----------------------------------------------------------------------
%% STYLING
%% -----------------------------------------------------------------------

%% Phase colors
classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff
classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff
classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff

%% Component states
classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff
classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5
classDef futureBlock fill:#f5f5f5,stroke:#95A5A6,stroke-dasharray: 5 5
classDef safetyBlock fill:#E74C3C,stroke:#A93226,color:#fff

%% Model layer
classDef models fill:#F39C12,stroke:#B7950B,color:#fff

%% Apply styles
class CAP,PRIV,STORE phase1
class EMB,IDX,SEARCH,LOADER,TRAIN,CKPT phase2
class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3
class CLAUDE,GPT,GEMINI,QWEN models
class L0,L1,L2 implemented
Loading

Core Approach: Demo-Conditioned Prompting

OpenAdapt explores demonstration-conditioned automation - "show, don't tell":

Traditional Agent OpenAdapt Agent
User writes prompts User records demonstration
Ambiguous instructions Grounded in actual UI
Requires prompt engineering Reduced prompt engineering
Context-free Context from similar demos

Retrieval powers BOTH training AND evaluation: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the publication roadmap for methodology and limitations.

Key Concepts

  • Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
  • Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
  • Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
  • Evaluation-Driven Feedback: Success traces become new training data

Legend: Solid = Implemented | Dashed = Future


Terminology

Term Description
Observation What the agent perceives (screenshot, accessibility tree)
Action What the agent does (click, type, scroll, etc.)
Trajectory Sequence of observation-action pairs
Demonstration Human-provided example trajectory
Policy Decision-making component that maps observations to actions
Grounding Mapping intent to specific UI elements (coordinates)

Demos


Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.


Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.


Contributing

  1. Join Discord
  2. Pick an issue from the relevant sub-package repository
  3. Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

Related Projects


Support


License

MIT License - see LICENSE for details.

About

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Resources

Readme

License

MIT license

Contributing

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

  • Python 88.6%
  • TypeScript 5.9%
  • JavaScript 2.8%
  • PowerShell 1.2%
  • Jinja 0.6%
  • Shell 0.6%
  • Other 0.3%