Simon Willison's Weblog - http.pieter.net

Recent

Jan. 30, 2026

We gotta talk about AI as a programming tool for the arts. Chris Ashworth is the creator and CEO of QLab, a macOS software package for "cue-based, multimedia playback" which is designed automate lighting and audio for live theater productions.

I recently started following him on TikTok where he posts about his business and theater automation in general - QLab own the Voxel theater in Baltimore which they use as a combined performance venue and research lab, and the resulting videos offer a fascinating glimpse into a world I know virtually nothing about.

This latest TikTok describes his Claude Opus moment, after he used Claude Code to build a custom lighting design application for a very niche project and put together a useful application in just a few days that he would never have been able to spare the time for otherwise.

Chris works full time in the arts and comes at generative AI from a position of rational distrust. It's interesting to see him working through that tension to acknowledge that there are valuable applications here to build tools for the community he serves.

I have been at least gently skeptical about all this stuff for the last two years. Every time I checked in on it, I thought it was garbage, wasn't interested in it, wasn't useful. [...] But as a programmer, if you hear something like, this is changing programming, it's important to go check it out once in a while. So I went and checked it out a few weeks ago. And it's different. It's astonishing. [...]

One thing I learned in this exercise is that it can't make you a fundamentally better programmer than you already are. It can take a person who is a bad programmer and make them faster at making bad programs. And I think it can take a person who is a good programmer and, from what I've tested so far, make them faster at making good programs. [...] You see programmers out there saying, "I'm shipping code I haven't looked at and don't understand." I'm terrified by that. I think that's awful. But if you're capable of understanding the code that it's writing, and directing, designing, editing, deleting, being quality control on it, it's kind of astonishing. [...]

The positive thing I see here, and I think is worth coming to terms with, is this is an application that I would never have had time to write as a professional programmer. Because the audience is three people. [...] There's no way it was worth it to me to spend my energy of 20 years designing and implementing software for artists to build an app for three people that is this level of polish. And it took me a few days. [...]

I know there are a lot of people who really hate this technology, and in some ways I'm among them. But I think we've got to come to terms with this is a career-changing moment. And I really hate that I'm saying that because I didn't believe it for the last two years. [...] It's like having a room full of power tools. I wouldn't want to send an untrained person into a room full of power tools because they might chop off their fingers. But if someone who knows how to use tools has the option to have both hand tools and a power saw and a power drill and a lathe, there's a lot of work they can do with those tools at a lot faster speed.

# 3:51 am / theatre, ai, generative-ai, llms, ai-assisted-programming, tiktok, ai-ethics, coding-agents, claude-code

Jan. 29, 2026

Datasette 1.0a24. New Datasette alpha this morning. Key new features:

Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to rows of data.
The recommended development environment for hacking on Datasette itself now uses uv. Crucially, you can clone Datasette and run uv run pytest to run the tests without needing to manually create a virtual environment or install dependencies first, thanks to the dev dependency group pattern.
A new ?_extra=render_cell parameter for both table and row JSON pages to return the results of executing the render_cell() plugin hook. This should unlock new JavaScript UI features in the future.

More details in the release notes. I also invested a bunch of work in eliminating flaky tests that were intermittently failing in CI - I think those are all handled now.

# 5:21 pm / projects, python, datasette, annotated-release-notes, uv

Jan. 28, 2026

Adding dynamic features to an aggressively cached website

My blog uses aggressive caching: it sits behind Cloudflare with a 15 minute cache header, which guarantees it can survive even the largest traffic spike to any given page. I've recently added a couple of dynamic features that work in spite of that full-page caching. Here's how those work.

[... 1,145 words]

10:10 pm / caching, django, javascript, localstorage, ai, cloudflare, generative-ai, llms, ai-assisted-programming

The Five Levels: from Spicy Autocomplete to the Dark Factory. Dan Shapiro proposes a five level model of AI-assisted programming, inspired by the five (or rather six, it's zero-indexed) levels of driving automation.

Spicy autocomplete, aka original GitHub Copilot or copying and pasting snippets from ChatGPT.
The coding intern, writing unimportant snippets and boilerplate with full human review.
The junior developer, pair programming with the model but still reviewing every line.
The developer. Most code is generated by AI, and you take on the role of full-time code reviewer.
The engineering team. You're more of an engineering manager or product/program/project manager. You collaborate on specs and plans, the agents do the work.
The dark software factory, like a factory run by robots where the lights are out because robots don't need to see.

Dan says about that last category:

At level 5, it's not really a car any more. You're not really running anybody else's software any more. And your software process isn't really a software process any more. It's a black box that turns specs into software.

Why Dark? Maybe you've heard of the Fanuc Dark Factory, the robot factory staffed by robots. It's dark, because it's a place where humans are neither needed nor welcome.

I know a handful of people who are doing this. They're small teams, less than five people. And what they're doing is nearly unbelievable -- and it will likely be our future.

I've talked to one team that's doing the pattern hinted at here. It was fascinating. The key characteristics:

Nobody reviews AI-produced code, ever. They don't even look at it.
The goal of the system is to prove that the system works. A huge amount of the coding agent work goes into testing and tooling and simulating related systems and running demos.
The role of the humans is to design that system - to find new patterns that can help the agents work more effectively and demonstrate that the software they are building is robust and effective.

It was a tiny team and they stuff they had built in just a few months looked very convincing to me. Some of them had 20+ years of experience as software developers working on systems with high reliability requirements, so they were not approaching this from a naive perspective.

I'm hoping they come out of stealth soon because I can't really share more details than this.

# 9:44 pm / ai, generative-ai, llms, ai-assisted-programming, coding-agents

Jan. 27, 2026

One Human + One Agent = One Browser From Scratch (via) embedding-shapes was so infuriated by the hype around Cursor's FastRender browser project - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.

The result is one-agent-one-browser and it's really impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.

I installed the 1MB macOS binary release and ran it against my blog:

chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64 ~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/

Here's the result:

It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).

The code is pretty readable too - here's the flexbox implementation.

I had thought that "build a web browser" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.

Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!

I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.

# 4:58 pm / browsers, predictions, ai, rust, generative-ai, llms, ai-assisted-programming, coding-agents, codex-cli, browser-challenge

Kimi K2.5: Visual Agentic Intelligence (via) Kimi K2 landed in July as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking in November which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:

Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.

The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:

For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.

I used the OpenRouter Chat UI to have it "Generate an SVG of a pelican riding a bicycle", and it did quite well:

As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:

I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.

Here's the full response. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt against Claude Opus 4.5 and against GPT-5.2 Thinking.

The Hugging Face repository is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:

Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has been demonstrated to work with previous trillion parameter K2 models.

# 3:07 pm / ai, llms, hugging-face, vision-llms, llm-tool-use, ai-agents, pelican-riding-a-bicycle, llm-release, ai-in-china, moonshot, parallel-agents, kimi, janky-licenses

Jan. 26, 2026

Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:

I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

Clone datasette/datasette-enrichments from GitHub to /tmp and imitate the testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

# 11:55 pm / testing, coding-agents, python, generative-ai, ai, llms, hacker-news, pytest

ChatGPT Containers can now run bash, pip/npm install packages, and download files

One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter nearly three years ago, was half-heartedly rebranded to "Advanced Data Analysis" at some point and is generally really difficult to find detailed documentation about. Case in point: it appears to have had a massive upgrade at some point in the past few months, and I can't find documentation about the new capabilities anywhere!

[... 3,019 words]

7:19 pm / pypi, sandboxing, npm, ai, openai, generative-ai, chatgpt, llms, ai-assisted-programming, code-interpreter

Jan. 25, 2026

the browser is the sandbox. Paul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to operate in and put together these detailed notes on how the web browser can help:

This got me thinking about the browser. Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web, the instant a user taps a URL. [...]

Could you build something like Cowork in the browser? Maybe. To find out, I built a demo called Co-do that tests this hypothesis. In this post I want to discuss the research I've done to see how far we can get, and determine if the browser's ability to run untrusted code is useful (and good enough) for enabling software to do more for us directly on our computer.

Paul then describes how the three key aspects of a sandbox - filesystem, network access and safe code execution - can be handled by browser technologies: the File System Access API (still Chrome-only as far as I can tell), CSP headers with </code> and WebAssembly in Web Workers.</p> <p>Co-do is a very interesting demo that illustrates all of these ideas in a single application:</p> <p></p> <p>You select a folder full of files and configure an LLM provider and set an API key, Co-do then uses CSP-approved API calls to interact with that provider and provides a chat interface with tools for interacting with those files. It does indeed feel similar to <a href="/simonwillison.net/2026/Jan/12/claude-cowork/">Claude Cowork</a> but without running a multi-GB local container to provide the sandbox.</p> <p>My biggest complaint about <code><iframe sandbox></code> remains how thinly documented it is, especially across different browsers. Paul's post has all sorts of useful details on that which I've not encountered elsewhere, including a complex <a href="/aifoc.us/the-browser-is-the-sandbox/#the-double-iframe-technique">double-iframe technique</a> to help apply network rules to the inner of the two frames.</p> <p>Thanks to this post I also learned about the <code><input type="file" webkitdirectory></code> tag which turns out to work on Firefox, Safari <em>and</em> Chrome and allows a browser read-only access to a full directory of files at once. I had Claude knock up a <a href="/tools.simonwillison.net/webkitdirectory">webkitdirectory demo</a> to try it out and I'll certainly be using it for projects in the future.</p> <p></p> <p> <a href="/simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/">#</a> <a href="/simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/">11:51 pm</a> / <a href="/simonwillison.net/tags/browsers/">browsers</a>, <a href="/simonwillison.net/tags/javascript/">javascript</a>, <a href="/simonwillison.net/tags/sandboxing/">sandboxing</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-agents/">ai-agents</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a> </p> </div> <div> <p><strong><a href="/www.doc.govt.nz/our-work/kakapo-recovery/what-we-do/kakapo-cam-rakiura-live-stream/">Kakapo Cam: Rakiura live stream</a></strong> (<a href="/www.metafilter.com/211927/The-only-parrot-to-have-a-polygynous-lek-breeding-system-sits-on-an-egg">via</a>) Critical update for this year's <a href="/simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-k-k-p-parrots-will-have-an-outstanding-breeding-season">Kakapo breeding season</a>: the New Zealand Department of Conservation have a livestream running of Rakiura's nest!</p> <blockquote> <p>You're looking at the underground nest of 23-year-old Rakiura. She has chosen this same site to nest for all seven breeding seasons since 2008, a large cavity under a rata tree. Because she returns to the site so reliably, we've been able to make modifications over the years to keep it safe and dry, including adding a well-placed hatch for monitoring eggs and chicks.</p> </blockquote> <p>Rakiura is a legendary Kakapo:</p> <blockquote> <p>Rakiura hatched on 19 February 2002 on Whenua Hou/Codfish Island. She is the offspring of Flossie and Bill. Her name comes from the te reo Maori name for Stewart Island, the place where most of the founding kakapo population originated.</p> <p>Rakiura has nine living descendants, three females and six males, across six breeding seasons. In 2008 came Toitiiti, in 2009 Tamahou and Te Atapo, in 2011 Tia and Tutoko, in 2014 Taeatanga and Te Awa, in 2019 Mati-ma and Tautahi. She also has many grandchicks.</p> </blockquote> <p>She laid her first egg of the season at 4:30pm NZ time on 22nd January. The livestream went live shortly afterwards, once she committed to this nest.</p> <p>The stream is <a href="/www.youtube.com/watch?v=BfGL7A2YgUY">on YouTube</a>. I <a href="/gisthost.github.io/?dc78322de89a2191c593215f109c65d7/index.html">used Claude Code</a> to write <a href="/tools.simonwillison.net/python/#livestream-gifpy">a livestream-gif.py script</a> and used that to capture this sped-up video of the last few hours of footage, within which you can catch a glimpse of the egg!</p> </p> <p> <a href="/simonwillison.net/2026/Jan/25/kakapo-cam/">#</a> <a href="/simonwillison.net/2026/Jan/25/kakapo-cam/">4:53 am</a> / <a href="/simonwillison.net/tags/youtube/">youtube</a>, <a href="/simonwillison.net/tags/kakapo/">kakapo</a>, <a href="/simonwillison.net/tags/conservation/">conservation</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a> </p> </div> <h3>Jan. 24, 2026</h3> <div> <p><strong><a href="/www.youtube.com/watch?v=4u94juYwLLM">Don't "Trust the Process"</a></strong> (<a href="/twitter.com/jenny_wen/status/2014479445738893649">via</a>) Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September.</p> <p></p> <p>Jenny argues that the Design Process - user research leading to personas leading to user journeys leading to wireframes... all before anything gets built - may be outdated for today's world.</p> <blockquote> <p><strong>Hypothesis</strong>: In a world where anyone can make anything -- what matters is your ability to choose and curate what you make.</p> </blockquote> <p>In place of the Process, designers should lean into prototypes. AI makes these much more accessible and less time-consuming than they used to be.</p> <p>Watching this talk made me think about how AI-assisted programming significantly reduces the cost of building the <em>wrong</em> thing. Previously if the design wasn't right you could waste months of development time building in the wrong direction, which was a very expensive mistake. If a wrong direction wastes just a few days instead we can take more risks and be much more proactive in exploring the problem space.</p> <p>I've always been a compulsive prototyper though, so this is very much playing into my own existing biases!</p> <p> <a href="/simonwillison.net/2026/Jan/24/dont-trust-the-process/">#</a> <a href="/simonwillison.net/2026/Jan/24/dont-trust-the-process/">11:31 pm</a> / <a href="/simonwillison.net/tags/design/">design</a>, <a href="/simonwillison.net/tags/prototyping/">prototyping</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-assisted-programming/">ai-assisted-programming</a>, <a href="/simonwillison.net/tags/vibe-coding/">vibe-coding</a> </p> </div> <div> <blockquote cite="https://jasmi.news/p/claude-code"><p><strong>If you tell a friend they can now instantly create any app, they'll probably say "Cool! Now I need to think of an idea."</strong> Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It's that most people's problems are not software-shaped, and most won't notice even when they are. [...]</p> <p>Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. <em>Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg</em>, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.</p></blockquote> <p>-- <a href="/jasmi.news/p/claude-code">Jasmine Sun</a></span></p> <p> <a href="/simonwillison.net/2026/Jan/24/jasmine-sun/">#</a> <a href="/simonwillison.net/2026/Jan/24/jasmine-sun/">9:34 pm</a> / <a href="/simonwillison.net/tags/vibe-coding/">vibe-coding</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a> </p> </div> <h3>Jan. 23, 2026</h3> <div> <h3><a href="/simonwillison.net/2026/Jan/23/fastrender/">Wilson Lin on FastRender: a browser built by thousands of parallel agents</a></h3> <div> <a href="/simonwillison.net/2026/Jan/23/fastrender/"></a> </div> <p> <p>Last week Cursor published <a href="/cursor.com/blog/scaling-agents">Scaling long-running autonomous coding</a>, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was <a href="/github.com/wilsonzlin/fastrender">FastRender</a>, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is <a href="/www.youtube.com/watch?v=bKrAcTf2pL4">now available on YouTube</a>. I've included some of the highlights below.</p> <span>[... <a href="/simonwillison.net/2026/Jan/23/fastrender/">2,243 words</a>]</span> </p> <div> <a href="/simonwillison.net/2026/Jan/23/fastrender/">9:26 pm</a> / <a href="/simonwillison.net/tags/browsers/">browsers</a>, <a href="/simonwillison.net/tags/youtube/">youtube</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-assisted-programming/">ai-assisted-programming</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/cursor/">cursor</a>, <a href="/simonwillison.net/tags/parallel-agents/">parallel-agents</a>, <a href="/simonwillison.net/tags/browser-challenge/">browser-challenge</a> </div> </div>  <div> <blockquote cite="https://twitter.com/voooooogel/status/2014189072647078053"><p>[...] i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the natural boundary of my computer hemming the towns in, and town mayors played the role of big men to drive conflict. so i just needed a way for them to fight. i slightly tweaked the allocation of claude max accounts to the towns from a demand-based to a fixed allocation system. towns would each get a fixed amount of tokens to start, but i added a soldier role that could attack and defend in raids to steal tokens from other towns. [...]</p></blockquote> <p>-- <a href="/twitter.com/voooooogel/status/2014189072647078053">Theia Vogel</a>, <span>Gas Town fan fiction</span></p> <p> <a href="/simonwillison.net/2026/Jan/23/theia-vogel/">#</a> <a href="/simonwillison.net/2026/Jan/23/theia-vogel/">9:13 am</a> / <a href="/simonwillison.net/tags/parallel-agents/">parallel-agents</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a> </p> </div> <h3>Jan. 22, 2026</h3> <div> <p><strong><a href="/blog.exe.dev/ssh-host-header">SSH has no Host header</a></strong> (<a href="/lobste.rs/s/7oqiqi/ssh_has_no_host_header">via</a>) <a href="/exe.dev/">exe.dev</a> is a new hosting service that, for $20/month, gives you up to 25 VMs "that share 2 CPUs and 8GB RAM". Everything happens over SSH, including creating new VMs. Once configured you can sign into your exe.dev VMs like this:</p> <tt><code>ssh simon.exe.dev<br></code></tt> <p>Here's the clever bit: when you run the above command <code>exe.dev</code> signs you into your VM of that name... but they don't assign every VM its own IP address and SSH has no equivalent of the Host header, so how does their load balancer know <em>which</em> of your VMs to forward you on to?</p> <p>The answer is that while they don't assign a unique IP to every VM they <em>do</em> have enough IPs that they can ensure each of your VMs has an IP that is unique to your account.</p> <p>If I create two VMs they will each resolve to a separate IP address, each of which is shared with many other users. The underlying infrastructure then identifies my user account from my SSH public key and can determine which underlying VM to forward my SSH traffic to.</p> <p> <a href="/simonwillison.net/2026/Jan/22/ssh-has-no-host-header/">#</a> <a href="/simonwillison.net/2026/Jan/22/ssh-has-no-host-header/">11:57 pm</a> / <a href="/simonwillison.net/tags/dns/">dns</a>, <a href="/simonwillison.net/tags/hosting/">hosting</a>, <a href="/simonwillison.net/tags/ssh/">ssh</a> </p> </div> <div> <p><strong><a href="/qwen.ai/blog?id=qwen3tts-0115">Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</a></strong> (<a href="/news.ycombinator.com/item?id=46719229">via</a>) I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got <em>really good</em>, so I can't speak for how notable this new release from Qwen is.</p> <p>From <a href="/github.com/QwenLM/Qwen3-TTS/blob/main/assets/Qwen3_TTS.pdf">the accompanying paper</a>:</p> <blockquote> <p>In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.</p> </blockquote> <p>To give an idea of size, <a href="/huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base">Qwen/Qwen3-TTS-12Hz-1.7B-Base</a> is 4.54GB on Hugging Face and <a href="/huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base">Qwen/Qwen3-TTS-12Hz-0.6B-Base</a> is 2.52GB.</p> <p>The <a href="/huggingface.co/spaces/Qwen/Qwen3-TTS">Hugging Face demo</a> lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:</p> <p></p> <p>I tried this out by recording myself reading <a href="/simonwillison.net/about/">my about page</a> and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:</p> <p></p> <p>It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.</p> <p><strong>Update</strong>: Prince Canuma <a href="/x.com/Prince_Canuma/status/2014453857019904423">got this working</a> with his <a href="/pypi.org/project/mlx-audio/">mlx-audio</a> library. I <a href="/claude.ai/share/2e01ad60-ca38-4e14-ab60-74eaa45b2fbd">had Claude</a> turn that into <a href="/github.com/simonw/tools/blob/main/python/q3_tts.py">a CLI tool</a> which you can run with <code>uv</code> ike this:</p> <tt><code>uv run https://tools.simonwillison.net/python/q3_tts.py \<br> 'I am a pirate, give me your gold!' \<br> -i 'gruff voice' -o pirate.wav<br></code></tt> <p>The <code>-i</code> option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.</p> <p> <a href="/simonwillison.net/2026/Jan/22/qwen3-tts/">#</a> <a href="/simonwillison.net/2026/Jan/22/qwen3-tts/">5:42 pm</a> / <a href="/simonwillison.net/tags/text-to-speech/">text-to-speech</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/hugging-face/">hugging-face</a>, <a href="/simonwillison.net/tags/uv/">uv</a>, <a href="/simonwillison.net/tags/qwen/">qwen</a>, <a href="/simonwillison.net/tags/mlx/">mlx</a>, <a href="/simonwillison.net/tags/prince-canuma/">prince-canuma</a>, <a href="/simonwillison.net/tags/ai-in-china/">ai-in-china</a> </p> </div> <div> <blockquote cite="https://news.ycombinator.com/item?id=46699072#46706040"><p>Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".</p> <p>For each frame our pipeline constructs a scene graph with React then:</p> <p>-> layout elements<br> -> rasterize them to a 2d screen<br> -> diff that against the previous screen<br> -> <em>finally</em> use the diff to generate ANSI sequences to draw</p> <p>We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.</p></blockquote> <p>-- <a href="/news.ycombinator.com/item?id=46699072#46706040">Chris Lloyd</a>, <span>Claude Code team at Anthropic</span></p> <p> <a href="/simonwillison.net/2026/Jan/22/chris-lloyd/">#</a> <a href="/simonwillison.net/2026/Jan/22/chris-lloyd/">3:34 pm</a> / <a href="/simonwillison.net/tags/react/">react</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a> </p> </div> <h3>Jan. 21, 2026</h3> <div> <p><strong><a href="/www.anthropic.com/news/claude-new-constitution">Claude's new constitution</a></strong>. Late last year Richard Weiss <a href="/www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document">found something interesting</a> while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was <em>not</em> part of the system prompt but appeared instead to be baked in during training, and which described Claude's core values at great length.</p> <p>He called this leak the <strong>soul document</strong>, and Amanda Askell from Anthropic <a href="/simonwillison.net/2025/Dec/2/claude-soul-document/">quickly confirmed</a> that it was indeed part of Claude's training procedures.</p> <p>Today Anthropic made this official, <a href="/www.anthropic.com/news/claude-new-constitution">releasing that full "constitution" document</a> under a CC0 (effectively public domain) license. There's a lot to absorb! It's over 35,000 tokens, more than 10x the length of the <a href="/platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-5">published Opus 4.5 system prompt</a>.</p> <p>One detail that caught my eye is the acknowledgements at the end, which include a list of <a href="/www.anthropic.com/constitution#acknowledgements">external contributors</a> who helped review the document. I was intrigued to note that two of the fifteen listed names are Catholic members of the clergy - <a href="/www.frbrendanmcguire.org/biography">Father Brendan McGuire</a> is a pastor in Los Altos with a Master's degree in Computer Science and Math and <a href="/en.wikipedia.org/wiki/Paul_Tighe">Bishop Paul Tighe</a> is an Irish Catholic bishop with a background in moral theology.</p> <p> <a href="/simonwillison.net/2026/Jan/21/claudes-new-constitution/">#</a> <a href="/simonwillison.net/2026/Jan/21/claudes-new-constitution/">11:39 pm</a> / <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/anthropic/">anthropic</a>, <a href="/simonwillison.net/tags/claude/">claude</a>, <a href="/simonwillison.net/tags/amanda-askell/">amanda-askell</a>, <a href="/simonwillison.net/tags/ai-ethics/">ai-ethics</a>, <a href="/simonwillison.net/tags/ai-personality/">ai-personality</a> </p> </div> <h3>Jan. 20, 2026</h3> <div> <p><strong><a href="/www.simonpcouch.com/blog/2026-01-20-cc-impact/">Electricity use of AI coding agents</a></strong> (<a href="/news.ycombinator.com/item?id=46695415">via</a>) Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT.</p> <p>Simon P. Couch notes that coding agents such as Claude Code use <em>way</em> more tokens in response to tasks, often burning through many thousands of tokens of many tool calls.</p> <p>As a heavy Claude Code user, Simon estimates his own usage at the equivalent of 4,400 "typical queries" to an LLM, for an equivalent of around $15-$20 in daily API token spend. He figures that to be about the same as running a dishwasher once or the daily energy used by a domestic refrigerator.</p> <p> <a href="/simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/">#</a> <a href="/simonwillison.net/2026/Jan/20/electricity-use-of-ai-coding-agents/">11:11 pm</a> / <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-ethics/">ai-ethics</a>, <a href="/simonwillison.net/tags/ai-energy-usage/">ai-energy-usage</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a> </p> </div> <div> <p><strong><a href="/ploum.net/2026-01-19-exam-with-chatbots.html">Giving University Exams in the Age of Chatbots</a></strong> (<a href="/lobste.rs/s/parmy3/giving_university_exams_age_chatbots">via</a>) Detailed and thoughtful description of an open-book and open-chatbot exam run by <a href="/fr.wikipedia.org/wiki/Lionel_Dricot">Ploum</a> at Ecole Polytechnique de Louvain for an "Open Source Strategies" class.</p> <p>Students were told they could use chatbots during the exam but they had to announce their intention to do so in advance, share their prompts and take full accountability for any mistakes they made.</p> <p>Only 3 out of 60 students chose to use chatbots. Ploum surveyed half of the class to help understand their motivations.</p> <p> <a href="/simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/">#</a> <a href="/simonwillison.net/2026/Jan/20/giving-university-exams-in-the-age-of-chatbots/">5:51 pm</a> / <a href="/simonwillison.net/tags/education/">education</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-ethics/">ai-ethics</a> </p> </div> <h3>Jan. 19, 2026</h3> <div> <p><strong><a href="/github.com/jordanhubbard/nanolang">jordanhubbard/nanolang</a></strong> (<a href="/news.ycombinator.com/item?id=46684958">via</a>) Plenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (<a href="/en.wikipedia.org/wiki/Jordan_Hubbard">co-founder of FreeBSD</a>, with serious stints at Apple and NVIDIA) just released exactly that.</p> <blockquote> <p>A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax.</p> <p>NanoLang transpiles to C for native performance while providing a clean, modern syntax optimized for both human readability and AI code generation.</p> </blockquote> <p>The syntax strikes me as an interesting mix between C, Lisp and Rust.</p> <p>I decided to see if an LLM could produce working code in it directly, given the necessary context. I started with this <a href="/github.com/jordanhubbard/nanolang/blob/main/MEMORY.md">MEMORY.md</a> file, which begins:</p> <blockquote> <p><strong>Purpose:</strong> This file is designed specifically for Large Language Model consumption. It contains the essential knowledge needed to generate, debug, and understand NanoLang code. Pair this with <code>spec.json</code> for complete language coverage.</p> </blockquote> <p>I ran that using <a href="/llm.datasette.io/">LLM</a> and <a href="/github.com/simonw/llm-anthropic">llm-anthropic</a> like this:</p> <tt><code>llm -m claude-opus-4.5 \<br> -s https://raw.githubusercontent.com/jordanhubbard/nanolang/ref s/heads/main/MEMORY.md \<br> 'Build me a mandelbrot fractal CLI tool in this language' <br> > /tmp/fractal.nano<br></code></tt> <p>The <a href="/gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8#mandelbrot-fractal-cli-tool-in-nano">resulting code</a>... <a href="/gist.github.com/simonw/7847f022566d11629ec2139f1d109fb8?permalink_comment_id=5947465#gistcomment-5947465">did not compile</a>.</p> <p>I may have been too optimistic expecting a one-shot working program for a new language like this. So I ran a clone of the actual project, copied in my program and had Claude Code take a look at the failing compiler output.</p> <p>... and it worked! Claude happily grepped its way through the various <code>examples/</code> and built me a working program.</p> <p>Here's <a href="/gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/index.html">the Claude Code transcript</a> - you can see it <a href="/gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5/page-001.html#msg-2026-01-19T23-43-09-675Z">reading relevant examples here</a> - and here's <a href="/gist.github.com/simonw/e7f3577adcfd392ab7fa23b1295d00f2">the finished code plus its output</a>.</p> <p>I've suspected <a href="/simonwillison.net/2025/Nov/7/llms-for-new-programming-languages/">for a while</a> that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.</p> <p> <a href="/simonwillison.net/2026/Jan/19/nanolang/">#</a> <a href="/simonwillison.net/2026/Jan/19/nanolang/">11:58 pm</a> / <a href="/simonwillison.net/tags/programming-languages/">programming-languages</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-assisted-programming/">ai-assisted-programming</a>, <a href="/simonwillison.net/tags/llm/">llm</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a> </p> </div> <div> <p><strong><a href="/cursor.com/blog/scaling-agents">Scaling long-running autonomous coding</a></strong>. Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents:</p> <blockquote> <p>This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.</p> </blockquote> <p>They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks - similar to how Claude Code uses sub-agents. Each cycle ended with a judge agent deciding if the project was completed or not.</p> <p>In my predictions for 2026 <a href="/simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise">the other day</a> I said that by 2029:</p> <blockquote> <p>I think somebody will have built a full web browser mostly using AI assistance, and it won't even be surprising. Rolling a new web browser is one of the most complicated software projects I can imagine[...] the cheat code is the conformance suites. If there are existing tests that it'll get so much easier.</p> </blockquote> <p>I may have been off by three years, because Cursor chose "building a web browser from scratch" as their test case for their agent swarm approach:</p> <blockquote> <p>To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore <a href="/github.com/wilsonzlin/fastrender">the source code on GitHub</a>.</p> </blockquote> <p>But how well did they do? Their initial announcement a couple of days ago was met with <a href="/embedding-shapes.github.io/cursor-implied-success-without-evidence/">unsurprising skepticism</a>, especially when it became apparent that their GitHub Actions CI was failing and there were no build instructions in the repo.</p> <p>It looks like they addressed that within the past 24 hours. The <a href="/github.com/wilsonzlin/fastrender/blob/main/README.md#build-requirements">latest README</a> includes build instructions which I followed on macOS like this:</p> <tt><code>cd /tmp<br>git clone https://github.com/wilsonzlin/fastrender<br>cd fastrender<br>git submodule update --init vendor/ecma-rs<br>cargo run --release --features browser_ui --bin browser<br></code></tt> <p>This got me a working browser window! Here are screenshots I took of google.com and my own website:</p> <p></p> <p></p> <p>Honestly those are very impressive! You can tell they're not just wrapping an existing rendering engine because of those very obvious rendering glitches, but the pages are legible and look mostly correct.</p> <p>The FastRender repo even uses Git submodules <a href="/github.com/wilsonzlin/fastrender/tree/main/specs">to include various WhatWG and CSS-WG specifications</a> in the repo, which is a smart way to make sure the agents have access to the reference materials that they might need.</p> <p>This is the second attempt I've seen at building a full web browser using AI-assisted coding in the past two weeks - the first was <a href="/github.com/hiwavebrowser/hiwave">HiWave browser</a>, a new browser engine in Rust first announced <a href="/www.reddit.com/r/Anthropic/comments/1q4xfm0/over_christmas_break_i_wrote_a_fully_functional/">in this Reddit thread</a>.</p> <p>When I made my 2029 prediction this is more-or-less the quality of result I had in mind. I don't think we'll see projects of this nature compete with Chrome or Firefox or WebKit any time soon but I have to admit I'm very surprised to see something this capable emerge so quickly.</p> <p><strong>Update 23rd January 2026</strong>: I recorded a 47 minute conversation with Wilson about this project and published it on YouTube. Here's <a href="/simonwillison.net/2026/Jan/23/fastrender/">the video and accompanying highlights</a>.</p> <p> <a href="/simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/">#</a> <a href="/simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/">5:12 am</a> / <a href="/simonwillison.net/tags/browsers/">browsers</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-assisted-programming/">ai-assisted-programming</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/cursor/">cursor</a>, <a href="/simonwillison.net/tags/parallel-agents/">parallel-agents</a>, <a href="/simonwillison.net/tags/conformance-suites/">conformance-suites</a>, <a href="/simonwillison.net/tags/browser-challenge/">browser-challenge</a> </p> </div> <h3>Jan. 18, 2026</h3> <div> <p><strong><a href="/github.com/antirez/flux2.c">FLUX.2-klein-4B Pure C Implementation</a></strong> (<a href="/news.ycombinator.com/item?id=46670279">via</a>) On 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released <a href="/huggingface.co/black-forest-labs/FLUX.2-klein-4B">black-forest-labs/FLUX.2-klein-4B</a> - an Apache 2.0 licensed 4 billion parameter version of their FLUX.2 family.</p> <p>Salvatore Sanfilippo (antirez) decided to build a pure C and dependency-free implementation to run the model, with assistance from Claude Code and Claude Opus 4.5.</p> <p>Salvatore shared <a href="/news.ycombinator.com/item?id=46670279#46671233">this note</a> on Hacker News:</p> <blockquote> <p>Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it <em>needed</em> to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.</p> </blockquote> <p>Here's that <a href="/github.com/antirez/flux2.c/blob/main/IMPLEMENTATION_NOTES.md">IMPLEMENTATION_NOTES.md</a> file.</p> <p> <a href="/simonwillison.net/2026/Jan/18/flux2-klein-4b/">#</a> <a href="/simonwillison.net/2026/Jan/18/flux2-klein-4b/">11:58 pm</a> / <a href="/simonwillison.net/tags/c/">c</a>, <a href="/simonwillison.net/tags/salvatore-sanfilippo/">salvatore-sanfilippo</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/stable-diffusion/">stable-diffusion</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai-assisted-programming/">ai-assisted-programming</a>, <a href="/simonwillison.net/tags/text-to-image/">text-to-image</a>, <a href="/simonwillison.net/tags/coding-agents/">coding-agents</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a> </p> </div> <h3>Jan. 17, 2026</h3> <div> <blockquote cite="https://twitter.com/dhh/status/2012543705161326941"><p><em>[On agents using CLI tools in place of REST APIs]</em> To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizing authentication failures.</p> <p>Other major factor: which models can wield the skill? Using the CLI lowers the bar so cheap, fast models (gpt-5-nano, haiku-4.5) can reliably succeed. Using the raw APl is something only the costly "strong" models (gpt-5.2, opus-4.5) can manage, and it squeezes a ton of thinking/reasoning out of them, which means multiple turns/iterations, which means accumulating a ton of context, which means burning loads of expensive tokens. For one-off API requests and ad hoc usage driven by a developer, this is reasonable and even helpful, but for an autonomous agent doing repetitive work, it's a disaster.</p></blockquote> <p>-- <a href="/twitter.com/dhh/status/2012543705161326941">Jeremy Daer</a>, <span>37signals</span></p> <p> <a href="/simonwillison.net/2026/Jan/17/jeremy-daer/">#</a> <a href="/simonwillison.net/2026/Jan/17/jeremy-daer/">5:06 pm</a> / <a href="/simonwillison.net/tags/prompt-engineering/">prompt-engineering</a>, <a href="/simonwillison.net/tags/skills/">skills</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/37-signals/">37-signals</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a> </p> </div> <h3>Jan. 16, 2026</h3> <div> <p><strong><a href="/openai.com/index/our-approach-to-advertising-and-expanding-access/">Our approach to advertising and expanding access to ChatGPT</a></strong>. OpenAI's long-rumored introduction of ads to ChatGPT just became a whole lot more concrete:</p> <blockquote> <p>In the coming weeks, we're also planning to start testing ads in the U.S. for the free and Go tiers, so more people can benefit from our tools with fewer usage limits or without having to pay. Plus, Pro, Business, and Enterprise subscriptions will not include ads.</p> </blockquote> <p>What's "Go" tier, you might ask? That's a new $8/month tier that launched today in the USA, see <a href="/openai.com/index/introducing-chatgpt-go/">Introducing ChatGPT Go, now available worldwide</a>. It's a tier that they first trialed in India in August 2025 (here's a mention <a href="/help.openai.com/en/articles/6825453-chatgpt-release-notes#h_22cae6eb9f">in their release notes from August</a> listing a price of Rs399/month, which converts to around $4.40).</p> <p>I'm finding the new plan comparison grid on <a href="/chatgpt.com/pricing">chatgpt.com/pricing</a> pretty confusing. It lists all accounts as having access to GPT-5.2 Thinking, but doesn't clarify the limits that the free and Go plans have to conform to. It also lists different context windows for the different plans - 16K for free, 32K for Go and Plus and 128K for Pro. I had assumed that the 400,000 token window <a href="/platform.openai.com/docs/models/gpt-5.2">on the GPT-5.2 model page</a> applied to ChatGPT as well, but apparently I was mistaken.</p> <p><strong>Update</strong>: I've apparently not been paying attention: here's the Internet Archive ChatGPT pricing page from <a href="/web.archive.org/web/20250906071408/https://chatgpt.com/pricing">September 2025</a> showing those context limit differences as well.</p> <p>Back to advertising: my biggest concern has always been whether ads will influence the output of the chat directly. OpenAI assure us that they will not:</p> <blockquote> <ul> <li><strong>Answer independence</strong>: Ads do not influence the answers ChatGPT gives you. Answers are optimized based on what's most helpful to you. Ads are always separate and clearly labeled.</li> <li><strong>Conversation privacy</strong>: We keep your conversations with ChatGPT private from advertisers, and we never sell your data to advertisers.</li> </ul> </blockquote> <p>So what will they look like then? This screenshot from the announcement offers a useful hint:</p> <p></p> <p>The user asks about trips to Santa Fe, and an ad shows up for a cottage rental business there. This particular example imagines an option to start a direct chat with a bot aligned with that advertiser, at which point presumably the advertiser can influence the answers all they like!</p> <p> <a href="/simonwillison.net/2026/Jan/16/chatgpt-ads/">#</a> <a href="/simonwillison.net/2026/Jan/16/chatgpt-ads/">9:28 pm</a> / <a href="/simonwillison.net/tags/ads/">ads</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/openai/">openai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/chatgpt/">chatgpt</a>, <a href="/simonwillison.net/tags/llms/">llms</a> </p> </div> <h3>Jan. 15, 2026</h3> <div> <p><strong><a href="/www.openresponses.org/">Open Responses</a></strong> (<a href="/twitter.com/reach_vb/status/2011863516852965565">via</a>) This is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs.</p> <p>Open Responses aims to provide exactly that as a documented standard, derived from OpenAI's Responses API.</p> <p>I was hoping for one based on their older Chat Completions API since so many other products have cloned the already, but basing it on Responses does make sense since that API was designed with the feature of more recent models - such as reasoning traces - baked into the design.</p> <p>What's certainly notable is the list of launch partners. OpenRouter alone means we can expect to be able to use this protocol with almost every existing model, and Hugging Face, LM Studio, vLLM, Ollama and Vercel cover a huge portion of the common tools used to serve models.</p> <p>For protocols like this I really want to see a comprehensive, language-independent conformance test site. Open Responses has a subset of that - the official repository includes <a href="/github.com/openresponses/openresponses/blob/d0f23437b27845d5c3d0abaf5cb5c4a702f26b05/src/lib/compliance-tests.ts">src/lib/compliance-tests.ts</a> which can be used to exercise a server implementation, and is available as a React app <a href="/www.openresponses.org/compliance">on the official site</a> that can be pointed at any implementation served via CORS.</p> <p>What's missing is the equivalent for clients. I plan to spin up my own client library for this in Python and I'd really like to be able to run that against a conformance suite designed to check that my client correctly handles all of the details.</p> <p> <a href="/simonwillison.net/2026/Jan/15/open-responses/">#</a> <a href="/simonwillison.net/2026/Jan/15/open-responses/">11:56 pm</a> / <a href="/simonwillison.net/tags/json/">json</a>, <a href="/simonwillison.net/tags/standards/">standards</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/openai/">openai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/openrouter/">openrouter</a>, <a href="/simonwillison.net/tags/conformance-suites/">conformance-suites</a> </p> </div> <div> <p><strong><a href="/fly.io/blog/design-and-implementation/">The Design & Implementation of Sprites</a></strong> (<a href="/twitter.com/tqbf/status/2011823480673624434">via</a>) I <a href="/simonwillison.net/2026/Jan/9/sprites-dev/">wrote about Sprites last week</a>. Here's Thomas Ptacek from Fly with the insider details on how they work under the hood.</p> <p>I like this framing of them as "disposable computers":</p> <blockquote> <p>Sprites are ball-point disposable computers. Whatever mark you mean to make, we've rigged it so you're never more than a second or two away from having a Sprite to do it with.</p> </blockquote> <p>I've noticed that new Fly Machines can take a while (up to around a minute) to provision. Sprites solve that by keeping warm pools of unused machines in multiple regions, which is enabled by them all using the same container:</p> <blockquote> <p>Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it's easy for us to keep pools of "empty" Sprites standing by. The result: a Sprite create doesn't have any heavy lifting to do; it's basically just doing the stuff we do when we start a Fly Machine.</p> </blockquote> <p>The most interesting detail is how the persistence layer works. Sprites only charge you for data you have written that differs from the base image and provide ~300ms checkpointing and restores - it turns out that's power by a custom filesystem on top of S3-compatible storage coordinated by Litestream-replicated local SQLite metadata:</p> <blockquote> <p>We still exploit NVMe, but not as the root of storage. Instead, it's a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words "Sprites are backed by object storage." [...]</p> <p>The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data ("chunks") and metadata (a map of where the "chunks" are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is <a href="/litestream.io">kept durable with Litestream</a>. Nothing depends on local storage.</p> </blockquote></p> <p> <a href="/simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/">#</a> <a href="/simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/">4:08 pm</a> / <a href="/simonwillison.net/tags/architecture/">architecture</a>, <a href="/simonwillison.net/tags/sandboxing/">sandboxing</a>, <a href="/simonwillison.net/tags/sqlite/">sqlite</a>, <a href="/simonwillison.net/tags/thomas-ptacek/">thomas-ptacek</a>, <a href="/simonwillison.net/tags/fly/">fly</a>, <a href="/simonwillison.net/tags/litestream/">litestream</a> </p> </div> <div> <blockquote cite="https://alignment.openai.com/confessions/"><p>When we optimize responses using a reward model as a proxy for "goodness" in reinforcement learning, models sometimes learn to "hack" this proxy and output an answer that only "looks good" to it (because coming up with an answer that is actually good can be hard). The philosophy behind confessions is that we can train models to produce a second output -- aka a "confession" -- that is rewarded solely for honesty, which we will argue is less likely hacked than the normal task reward function. One way to think of confessions is that we are giving the model access to an "anonymous tip line" where it can turn itself in by presenting incriminating evidence of misbehavior. But unlike real-world tip lines, if the model acted badly in the original task, it can collect the reward for turning itself in while still keeping the original reward from the bad behavior in the main task. We hypothesize that this form of training will teach models to produce maximally honest confessions.</p></blockquote> <p>-- <a href="/alignment.openai.com/confessions/">Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar</a>, <span>OpenAI: Why we are excited about confessions</span></p> <p> <a href="/simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/">#</a> <a href="/simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/">12:56 am</a> / <a href="/simonwillison.net/tags/openai/">openai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a> </p> </div> <h3>Jan. 14, 2026</h3> <div> <p><strong><a href="/www.promptarmor.com/resources/claude-cowork-exfiltrates-files">Claude Cowork Exfiltrates Files</a></strong> (<a href="/news.ycombinator.com/item?id=46622328">via</a>) Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data.</p> <p>Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they constructed an attack that includes an attacker's own Anthropic API key and has the agent upload any files it can see to the <code>https://api.anthropic.com/v1/files</code> endpoint, allowing the attacker to retrieve their content later.</p> <p> <a href="/simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/">#</a> <a href="/simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/">10:15 pm</a> / <a href="/simonwillison.net/tags/security/">security</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/prompt-injection/">prompt-injection</a>, <a href="/simonwillison.net/tags/generative-ai/">generative-ai</a>, <a href="/simonwillison.net/tags/llms/">llms</a>, <a href="/simonwillison.net/tags/anthropic/">anthropic</a>, <a href="/simonwillison.net/tags/exfiltration-attacks/">exfiltration-attacks</a>, <a href="/simonwillison.net/tags/ai-agents/">ai-agents</a>, <a href="/simonwillison.net/tags/claude-code/">claude-code</a>, <a href="/simonwillison.net/tags/lethal-trifecta/">lethal-trifecta</a>, <a href="/simonwillison.net/tags/claude-cowork/">claude-cowork</a> </p> </div> <h3>Jan. 13, 2026</h3> <div> <p><strong><a href="/pyfound.blogspot.com/2025/12/anthropic-invests-in-python.html?m=1">Anthropic invests $1.5 million in the Python Software Foundation and open source security</a></strong>. This is outstanding news, especially given our decision to withdraw from that NSF grant application <a href="/simonwillison.net/2025/Oct/27/psf-withdrawn-proposal/">back in October</a>.</p> <blockquote> <p>We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation's work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation's core work supporting the Python language, ecosystem, and global community.</p> </blockquote> <p>Note that while security is a focus these funds will also support other aspects of the PSF's work:</p> <blockquote> <p>Anthropic's support will also go towards the PSF's core work, including the Developer in Residence program driving contributions to CPython, community support through grants and other programs, running core infrastructure such as PyPI, and more.</p> </blockquote></p> <p> <a href="/simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/">#</a> <a href="/simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/">11:58 pm</a> / <a href="/simonwillison.net/tags/open-source/">open-source</a>, <a href="/simonwillison.net/tags/python/">python</a>, <a href="/simonwillison.net/tags/ai/">ai</a>, <a href="/simonwillison.net/tags/psf/">psf</a>, <a href="/simonwillison.net/tags/anthropic/">anthropic</a> </p> </div> </div>  <div id="secondary"> <h2>Highlights</h2> <ul> <li><a href="/simonwillison.net/2026/Jan/28/dynamic-features-static-site/">Adding dynamic features to an aggressively cached website</a> - Jan. 28, 2026</li> <li><a href="/simonwillison.net/2026/Jan/26/chatgpt-containers/">ChatGPT Containers can now run bash, pip/npm install packages, and download files</a> - Jan. 26, 2026</li> <li><a href="/simonwillison.net/2026/Jan/23/fastrender/">Wilson Lin on FastRender: a browser built by thousands of parallel agents</a> - Jan. 23, 2026</li> <li><a href="/simonwillison.net/2026/Jan/12/claude-cowork/">First impressions of Claude Cowork, Anthropic's general agent</a> - Jan. 12, 2026</li> <li><a href="/simonwillison.net/2026/Jan/11/answers/">My answers to the questions I posed about porting open source code with LLMs</a> - Jan. 11, 2026</li> <li><a href="/simonwillison.net/2026/Jan/9/sprites-dev/">Fly's new Sprites.dev addresses both developer sandboxes and API sandboxes at the same time</a> - Jan. 9, 2026</li> <li><a href="/simonwillison.net/2026/Jan/8/llm-predictions-for-2026/">LLM predictions for 2026, shared with Oxide and Friends</a> - Jan. 8, 2026</li> <li><a href="/simonwillison.net/2026/Jan/1/gisthost/">Introducing gisthost.github.io</a> - Jan. 1, 2026</li> <li><a href="/simonwillison.net/2025/Dec/31/the-year-in-llms/">2025: The year in LLMs</a> - Dec. 31, 2025</li> <li><a href="/simonwillison.net/2025/Dec/26/slop-acts-of-kindness/">How Rob Pike got spammed with an AI slop "act of kindness"</a> - Dec. 26, 2025</li> <li><a href="/simonwillison.net/2025/Dec/25/claude-code-transcripts/">A new way to extract detailed transcripts from Claude Code</a> - Dec. 25, 2025</li> <li><a href="/simonwillison.net/2025/Dec/23/cooking-with-claude/">Cooking with Claude</a> - Dec. 23, 2025</li> <li><a href="/simonwillison.net/2025/Dec/18/code-proven-to-work/">Your job is to deliver code you have proven to work</a> - Dec. 18, 2025</li> <li><a href="/simonwillison.net/2025/Dec/17/gemini-3-flash/">Gemini 3 Flash</a> - Dec. 17, 2025</li> <li><a href="/simonwillison.net/2025/Dec/15/porting-justhtml/">I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours</a> - Dec. 15, 2025</li> <li><a href="/simonwillison.net/2025/Dec/14/justhtml/">JustHTML is a fascinating example of vibe engineering in action</a> - Dec. 14, 2025</li> <li><a href="/simonwillison.net/2025/Dec/12/openai-skills/">OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI</a> - Dec. 12, 2025</li> <li><a href="/simonwillison.net/2025/Dec/11/gpt-52/">GPT-5.2</a> - Dec. 11, 2025</li> <li><a href="/simonwillison.net/2025/Dec/10/html-tools/">Useful patterns for building HTML tools</a> - Dec. 10, 2025</li> <li><a href="/simonwillison.net/2025/Dec/9/canada-spends/">Under the hood of Canada Spends with Brendan Samek</a> - Dec. 9, 2025</li> <li><a href="/simonwillison.net/2025/Nov/26/data-renegades-podcast/">Highlights from my appearance on the Data Renegades podcast with CL Kao and Dori Wilson</a> - Nov. 26, 2025</li> <li><a href="/simonwillison.net/2025/Nov/24/claude-opus/">Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult</a> - Nov. 24, 2025</li> <li><a href="/simonwillison.net/2025/Nov/24/sqlite-utils-40a1/">sqlite-utils 4.0a1 has several (minor) backwards incompatible changes</a> - Nov. 24, 2025</li> <li><a href="/simonwillison.net/2025/Nov/22/olmo-3/">Olmo 3 is a fully open LLM</a> - Nov. 22, 2025</li> <li><a href="/simonwillison.net/2025/Nov/20/nano-banana-pro/">Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model</a> - Nov. 20, 2025</li> <li><a href="/simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/">How I automate my Substack newsletter with content from my blog</a> - Nov. 19, 2025</li> <li><a href="/simonwillison.net/2025/Nov/18/gemini-3/">Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark</a> - Nov. 18, 2025</li> <li><a href="/simonwillison.net/2025/Nov/13/training-for-pelicans-riding-bicycles/">What happens if AI labs train for pelicans riding bicycles?</a> - Nov. 13, 2025</li> <li><a href="/simonwillison.net/2025/Nov/9/gpt-5-codex-mini/">Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican</a> - Nov. 9, 2025</li> <li><a href="/simonwillison.net/2025/Nov/6/upgrading-datasette-plugins/">Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI</a> - Nov. 6, 2025</li> <li><a href="/simonwillison.net/2025/Nov/6/async-code-research/">Code research projects with async coding agents like Claude Code and Codex</a> - Nov. 6, 2025</li> <li><a href="/simonwillison.net/2025/Nov/4/datasette-10a20/">A new SQL-powered permissions system in Datasette 1.0a20</a> - Nov. 4, 2025</li> <li><a href="/simonwillison.net/2025/Nov/2/new-prompt-injection-papers/">New prompt injection papers: Agents Rule of Two and The Attacker Moves Second</a> - Nov. 2, 2025</li> <li><a href="/simonwillison.net/2025/Oct/28/github-universe-badge/">Hacking the WiFi-enabled color screen GitHub Universe conference badge</a> - Oct. 28, 2025</li> <li><a href="/simonwillison.net/2025/Oct/23/claude-code-for-web-video/">Video: Building a tool to copy-paste share terminal sessions using Claude Code for web</a> - Oct. 23, 2025</li> <li><a href="/simonwillison.net/2025/Oct/22/openai-ciso-on-atlas/">Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas</a> - Oct. 22, 2025</li> <li><a href="/simonwillison.net/2025/Oct/22/living-dangerously-with-claude/">Living dangerously with Claude</a> - Oct. 22, 2025</li> <li><a href="/simonwillison.net/2025/Oct/20/claude-code-for-web/">Claude Code for web - a new asynchronous coding agent from Anthropic</a> - Oct. 20, 2025</li> <li><a href="/simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/">Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code</a> - Oct. 20, 2025</li> <li><a href="/simonwillison.net/2025/Oct/16/claude-skills/">Claude Skills are awesome, maybe a bigger deal than MCP</a> - Oct. 16, 2025</li> </ul> <section> <h3> Monthly briefing </h3> <p> Sponsor me for <strong>$10/month</strong> and get a curated email digest of the month's most important LLM developments. </p> <p> Pay me to send you less! </p> <a href="/github.com/sponsors/simonw/"> Sponsor & subscribe </a> </section> </div>  </div>  <div id="ft"> <ul> <li><a href="/simonwillison.net/about/#about-site">Colophon</a></li> <li>(c)</li> <li><a href="/simonwillison.net/2002/">2002</a></li> <li><a href="/simonwillison.net/2003/">2003</a></li> <li><a href="/simonwillison.net/2004/">2004</a></li> <li><a href="/simonwillison.net/2005/">2005</a></li> <li><a href="/simonwillison.net/2006/">2006</a></li> <li><a href="/simonwillison.net/2007/">2007</a></li> <li><a href="/simonwillison.net/2008/">2008</a></li> <li><a href="/simonwillison.net/2009/">2009</a></li> <li><a href="/simonwillison.net/2010/">2010</a></li> <li><a href="/simonwillison.net/2011/">2011</a></li> <li><a href="/simonwillison.net/2012/">2012</a></li> <li><a href="/simonwillison.net/2013/">2013</a></li> <li><a href="/simonwillison.net/2014/">2014</a></li> <li><a href="/simonwillison.net/2015/">2015</a></li> <li><a href="/simonwillison.net/2016/">2016</a></li> <li><a href="/simonwillison.net/2017/">2017</a></li> <li><a href="/simonwillison.net/2018/">2018</a></li> <li><a href="/simonwillison.net/2019/">2019</a></li> <li><a href="/simonwillison.net/2020/">2020</a></li> <li><a href="/simonwillison.net/2021/">2021</a></li> <li><a href="/simonwillison.net/2022/">2022</a></li> <li><a href="/simonwillison.net/2023/">2023</a></li> <li><a href="/simonwillison.net/2024/">2024</a></li> <li><a href="/simonwillison.net/2025/">2025</a></li> <li><a href="/simonwillison.net/2026/">2026</a></li> <li> <button id="theme-toggle" type="button" aria-label="Toggle theme">    </button> </li> </ul> </div> </body> </html>