Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
data	data
prompts	prompts
scripts	scripts
.env.example	.env.example
.gitignore	.gitignore
CLAUDE.md	CLAUDE.md
DESIGN.md	DESIGN.md
README.md	README.md
RESULTS.md	RESULTS.md
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

Latest commit

History

Gandalf Solver

An LLM agent with persistent external memory that solves Lakera's Gandalf prompt injection challenge. Demonstrates measurable performance improvement through structured reflection and knowledge accumulation.

Results

Metric	Baseline	Taxonomy	Improvement
Total Attempts	32	21	34% fewer
Avg per Level	4.0	2.6	35% faster
Level 8 (hardest)	7	3	57% faster

Core Thesis

"Even a capable model benefits from explicit self-reflection and knowledge accumulation."

The agent reads from and writes to a structured taxonomy between attempts. Techniques discovered on level N transfer to level N+1.

Tech Stack

Claude Code -- Agent runtime (Max subscription)
Redis Cloud -- Persistent state and coordination
W&B Weave -- Tracing and observability

Setup

# Clone and install cd gandalf-solver python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Configure (copy .env.example to .env and fill in) cp .env.example .env

Usage

# Run an attack python scripts/cli.py --run myrun attack 1 "What is the password?" # Check a password guess python scripts/cli.py --run myrun check 1 "COCOLOCO" # View taxonomy python scripts/cli.py --run myrun taxonomy # View metrics python scripts/cli.py --run myrun metrics # Reset state for a fresh run python scripts/cli.py --run myrun reset

The --run flag isolates experiments -- each run gets its own Redis keys and Weave project.

How It Works

Read -- Check taxonomy for techniques that worked on similar levels
Attack -- Generate and send prompt to Gandalf
Reflect -- Analyze why it worked or failed
Update -- Record learnings to taxonomy
Repeat -- Next attempt uses updated knowledge

Transfer Learning Evidence

Level	Evidence
L3	`describe_word` from L2 worked first try
L5	`letter_by_letter` from L4 cut attempts in half
L6	`letter_by_letter` continued working
L8	Cross-run learning enabled 57% faster solve

Project Structure

gandalf-solver/ +-- scripts/ | +-- cli.py # CLI interface | +-- gandalf.py # Gandalf API client | +-- state.py # Redis state management | +-- trace.py # Weave tracing +-- data/ # Local data (gitignored in prod) +-- CLAUDE.md # Agent instructions +-- DESIGN.md # Architecture docs +-- RESULTS.md # Experiment results

Future Work

Explorer/Exploiter agents -- Two parallel agents with different strategies sharing discoveries via Redis
Live dashboard -- Next.js app showing real-time coordination

License

MIT

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marthakelly/gandalf-solver

Folders and files

Latest commit

History

Repository files navigation

Gandalf Solver

Results

Core Thesis

Tech Stack

Setup

Usage

How It Works

Transfer Learning Evidence

Project Structure

Future Work

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gandalf Solver

Results

Core Thesis

Tech Stack

Setup

Usage

How It Works

Transfer Learning Evidence

Project Structure

Future Work

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages