Simon Willison's Weblog: substack

Substack Network error = security content they don't allow to be sent

2025-12-28T04:16:27+00:00

I just sent out the latest edition of the newsletter version of this blog. It's a long one! Turns out I wrote a lot of stuff in the past 10 days.

The newsletter is out two days later than I had planned because I kept running into an infuriating issue with Substack: it would refuse to save my content with a "Network error" and "Not saved" and I couldn't figure out why.

So I asked ChatGPT to dig into it, which dug up this Hacker News post about the string /etc/hosts triggering an error.

And yeah, it turns out my newsletter included this post describing a SQL injection attack against ClickHouse and PostgreSQL which included the full exploit that was used.

Deleting that annotated example exploit allowed me to send the letter!

Tags: sql-injection, security, newsletter, substack

How I automate my Substack newsletter with content from my blog

2025-11-19T22:00:34+00:00

I sent out my weekly-ish Substack newsletter this morning and took the opportunity to record a YouTube video demonstrating my process and describing the different components that make it work. There's a lot of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.

The core process is the same as I described back in 2023. I have an Observable notebook called blog-to-newsletter which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button.

I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes.

I make very minor edits:

I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post.
Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example.
Blocks of preformatted text often have an extra blank line at the end, which I remove.
Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more.
I pick the featured image for the newsletter and add some tags.

That's the whole process!

The Observable notebook

The most important cell in the Observable notebook is this one:

raw_content = {
 return await (
 await fetch(
 `https://datasette.simonwillison.net/simonwillisonblog.json?sql=${encodeURIComponent(
 sql
 )}&_shape=array&numdays=${numDays}`
 )
 ).json();
}

This uses the JavaScript fetch() function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook.

Here's a link to see and execute that query directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet:

with content as (
 select
 id,
 'entry' as type,
 title,
 created,
 slug,
 '' 
 || title || ' - ' || date(created) || '
' || body
 as html,
 'null' as json,
 '' as external_url
 from blog_entry
 union all
 # ...

My blog's URLs look like /2025/Nov/18/gemini-3/ - this SQL constructs that three letter month abbreviation from the month number using a substring operation.

This is a terrible way to assemble HTML, but I've stuck with it because it amuses me.

The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button.

Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective:

Object.assign(
 html`<button style="font-size: 1.4em; padding: 0.3em 1em; font-weight: bold;">Copy rich text newsletter to clipboard`,
 {
 onclick: () => {
 const htmlContent = newsletterHTML;
 // Create a temporary element to hold the HTML content
 const tempElement = document.createElement("div");
 tempElement.innerHTML = htmlContent;
 document.body.appendChild(tempElement);
 // Select the HTML content
 const range = document.createRange();
 range.selectNode(tempElement);
 // Copy the selected HTML content to the clipboard
 const selection = window.getSelection();
 selection.removeAllRanges();
 selection.addRange(range);
 document.execCommand("copy");
 selection.removeAllRanges();
 document.body.removeChild(tempElement);
 }
 }
)

From Django+Postgresql to Datasette+SQLite

My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's the source code for that Django application. I use the Django admin as my CMS.

Datasette provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use.

My system for doing that lives in the simonw/simonwillisonblog-backup GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite.

My db-to-sqlite tool is responsible for that conversion. I call it like this:

db-to-sqlite \
 $(heroku config:get DATABASE_URL -a simonwillisonblog | sed s/postgres:/postgresql+psycopg2:/) \
 simonwillisonblog.db \
 --table auth_permission \
 --table auth_user \
 --table blog_blogmark \
 --table blog_blogmark_tags \
 --table blog_entry \
 --table blog_entry_tags \
 --table blog_quotation \
 --table blog_quotation_tags \
 --table blog_note \
 --table blog_note_tags \
 --table blog_tag \
 --table blog_previoustagname \
 --table blog_series \
 --table django_content_type \
 --table redirects_redirect

That heroku config:get DATABASE_URL command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme).

db-to-sqlite can then export that data and write it to a SQLite database file called simonwillisonblog.db.

The --table options specify the tables that should be included in the export.

The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a commit history of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself.

At the end of my GitHub Actions workflow is this code that publishes the resulting database to Datasette running on Fly.io using the datasette publish fly plugin:

datasette publish fly simonwillisonblog.db \
 -m metadata.yml \
 --app simonwillisonblog-backup \
 --branch 1.0a2 \
 --extra-options "--setting sql_time_limit_ms 15000 --setting truncate_cells_html 10000 --setting allow_facet off" \
 --install datasette-block-robots \
 # ... more plugins

As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low.

Tags: blogging, django, javascript, postgresql, sql, sqlite, youtube, heroku, datasette, observable, github-actions, fly, newsletter, substack

Semi-automating a Substack newsletter with an Observable notebook

2023-04-04T17:55:28+00:00

I recently started sending out a weekly-ish email newsletter consisting of content from my blog. I've mostly automated that, using an Observable Notebook to generate the HTML. Here's how that system works.

What goes in my newsletter

My blog has three types of content: entries, blogmarks and quotations. "Blogmarks" is a name I came up with for bookmarks in 2003.

Blogmarks and quotations show up in my blog's sidebar, entries get the main column - but on mobile the three are combined into a single flow.

These live in a PostgreSQL database managed by Django. You can see them defined in models.py in my blog's open source repo.

My newsletter consists of all of the new entries, blogmarks and quotations since I last sent it out. I include the entries first in reverse chronological order, since usually the entry I've just written is the one I want to use for the email subject. The blogmarks and quotations come in chronological order afterwards.

I'm including the full HTML for everything: people don't need to click through back to my blog to read it, all of the content should be right there in their email client.

The Substack API: RSS and copy-and-paste

Substack doesn't yet offer an API, and have no public plans to do so.

They do offer an RSS feed of each newsletter though - add /feed to the newsletter subdomain to get it. Mine is at https://simonw.substack.com/feed.

So we can get data back out again... but what about getting data in? I don't want to manually assemble a newsletter from all of these different sources of data.

That's where copy-and-paste comes in.

The Substack compose editor incorporates a well built rich-text editor. You can paste content into it and it will clean it up to fit the subset of HTML that Substack supports... but that's a pretty decent subset. Headings, paragraphs, lists, links, code blocks and images are all supported.

The vast majority of content on my blog fits that subset neatly.

Crucially, pasting in images as part of that rich text content Just Works: Substack automatically copies any images to their substack-post-media S3 bucket and embeds links to their CDN in the body of the newsletter.

So... if I can generate the intended rich-text HTML for my whole newsletter, I can copy and paste it directly into the Substack.

That's exactly what my new Observable notebook does: https://observablehq.com/@simonw/blog-to-newsletter

Generating HTML is a well trodden path, but I also wanted a "copy to clipboard" button that would copy the rich text version of that HTML such that pasting it into Substack would do the right thing.

With a bit of help from MDN and ChatGPT (my TIL) I figured out the following:

function copyRichText(html) {
 const htmlContent = html;
 // Create a temporary element to hold the HTML content
 const tempElement = document.createElement("div");
 tempElement.innerHTML = htmlContent;
 document.body.appendChild(tempElement);
 // Select the HTML content
 const range = document.createRange();
 range.selectNode(tempElement);
 // Copy the selected HTML content to the clipboard
 const selection = window.getSelection();
 selection.removeAllRanges();
 selection.addRange(range);
 document.execCommand("copy");
 selection.removeAllRanges();
 document.body.removeChild(tempElement);
}

This works great! Set up a button that triggers that function and clicking that button will copy a rich text version of the HTML to the clipboard, such that pasting it directly into the Substack editor has the desired effect.

Assembling the HTML

I love using Observable Notebooks for this kind of project: quick data integration tools that need a UI and will likely be incrementally improved over time.

Using Observable for these means I don't need to host anything and I can iterate my way to the right solution really quickly.

First, I needed to retrieve my entries, blogmarks and quotations.

I never built an API for my Django blog directly, but a while ago I set up a mechanism that exports the contents of my blog to my simonwillisonblog-backup GitHub repository for safety, and then deploys a Datasette/SQLite copy of that data to https://datasette.simonwillison.net/.

Datasette offers a JSON API for querying that data, and exposes open CORS headers which means JavaScript running in Observable can query it directly.

Here's an example SQL query running against that Datasette instance - click the .json link on that page to get that data back as JSON instead.

My Observable notebook can then retrieve the exact data it needs to construct the HTML for the newsletter.

The smart thing to do would have been to retrieve the data from the API and then use JavaScript inside Observable to compose that together into the HTML for the newsletter.

I decided to challenge myself to doing most of the work in SQL instead, and came up with the following absolute monster of a query:

with content as (
 select
 'entry' as type, title, created, slug,
 '' 
 || title || ' - ' || date(created) || '
' || body
 as html,
 '' as external_url
 from blog_entry
 union all
 select
 'blogmark' as type,
 link_title, created, slug,
 'Link ' || date(created) || ' '
 || link_title || ':' || ' ' || commentary || '
'
 as html,
 link_url as external_url
 from blog_blogmark
 union all
 select
 'quotation' as type,
 source, created, slug,
 'Quote ' || date(created) || ''
 || replace(quotation, '
', '
') || '
' || source || '
'
 as html,
 source_url as external_url
 from blog_quotation
),
collected as (
 select
 type,
 title,
 'https://simonwillison.net/' || strftime('%Y/', created)
 || substr('JanFebMarAprMayJunJulAugSepOctNovDec', (strftime('%m', created) - 1) * 3 + 1, 3) || 
 '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/'
 as url,
 created,
 html,
 external_url
 from content
 where created >= date('now', '-' || :numdays || ' days') 
 order by created desc
)
select type, title, url, created, html, external_url
from collected 
order by 
 case type 
 when 'entry' then 0 
 else 1 
 end,
 case type 
 when 'entry' then created 
 else -strftime('%s', created) 
 end desc

This logic really should be in the JavaScript instead! You can try that query in Datasette.

There are a bunch of tricks in there, but my favourite is this one:

select 'https://simonwillison.net/' || strftime('%Y/', created)
 || substr(
 'JanFebMarAprMayJunJulAugSepOctNovDec',
 (strftime('%m', created) - 1) * 3 + 1, 3
 ) || '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/'
 as url

This is the trick I'm using to generate the URL for each entry, blogmark and quotation.

These are stored as datetime values in the database, but the eventual URLs look like this:

https://simonwillison.net/2023/Apr/2/calculator-for-words/

So I need to turn that date into a YYYY/Mon/DD URL component.

One problem: SQLite doesn't have a date format string that produces a three letter month abbreviation. But... with cunning application of the substr() function and a string of all the month abbreviations I can get what I need.

The above SQL query plus a little bit of JavaScript provides almost everything I need to generate the HTML for my newsletter.

Excluding previously sent content

There's one last problem to solve: I want to send a newsletter containing everything that's new since my last edition - I don't want to send out the same content twice.

I came up with a delightfully gnarly solution to that as well.

As mentioned earlier, Substack provides an RSS feed of previous editions. I can use that data to avoid including content that's already been sent.

One problem: the Substack RSS feed does't include CORS headers, which means I can't access it directly from my notebook.

GitHub offers CORS headers for every file in every repository. I already had a repo that was backing up my blog... so why not set that to backup my RSS feed from Substack as well?

I added this to my existing backup.yml GitHub Actions workflow:

- name: Backup Substack
 run: |-
 curl 'https://simonw.substack.com/feed' | \
 python -c "import sys, xml.dom.minidom; print(xml.dom.minidom.parseString(sys.stdin.read()).topretty xml(indent=' '))" \
 > simonw-substack-com.xml

I'm piping it through a tiny Python script here to pretty-print the XML before saving it, because pretty-printed XML is easier to read diffs against later on.

Now simonw-substack-com.xml is a copy of my RSS feed in a GitHub repo, which means I can access the data directly from JavaScript running on Observable.

Here's the code I wrote there to fetch that RSS feed, parse it as XML and return a string containing just the HTML of all of the posts:

previousNewsletters = {
 const response = await fetch(
 "https://raw.githubusercontent.com/simonw/simonwillisonblog-backup/main/simonw-substack-com.xml"
 );
 const rss = await response.text();
 const parser = new DOMParser();
 const xmlDoc = parser.parseFromString(rss, "application/xml");
 const xpathExpression = "//content:encoded";

 const namespaceResolver = (prefix) => {
 const ns = {
 content: "http://purl.org/rss/1.0/modules/content/"
 };
 return ns[prefix] || null;
 };

 const result = xmlDoc.evaluate(
 xpathExpression,
 xmlDoc,
 namespaceResolver,
 XPathResult.ANY_TYPE,
 null
 );
 let node;
 let text = [];
 while ((node = result.iterateNext())) {
 text.push(node.textContent);
 }
 return text.join("\n");
}

Then I span up a regular expression to extract all of the URLs from that HTML:

previousLinks = {
 const regex = /(?:"|")(https?:\/\/[^\s"<>]+)(?:"|")/g;
 return Array.from(previousNewsletters.matchAll(regex), (match) => match[1]);
}

Added a "skip existing" toggle checkbox to my notebook:

viewof skipExisting = Inputs.toggle({
 label: "Skip content sent in prior newsletters"
})

And added this code to filter the raw content based on whether or not the toggle was selected:

content = skipExisting
 ? raw_content.filter(
 (e) =>
 !previousLinks.includes(e.url) &&
 !previousLinks.includes(e.external_url)
 )
 : raw_content

The url is the URL to the post on my blog. external_url is the URL to the original source of the blogmark or quotation. A match against ether of those should exclude the content from my next newsletter.

My workflow for sending a newsletter

Given all of the above, sending a newsletter out is hardly any work at all:

Ensure the most recent backup of my blog has run, such that the Datasette instance contains my latest content. I do that by triggering this action.
Navigate to https://observablehq.com/@simonw/blog-to-newsletter - select "Skip content sent in prior newsletters" and then click the "Copy rich text newsletter to clipboard" button.
Navigate to the Substack "publish" interface and paste that content into the rich text editor.
Pick a title and subheading, and maybe add a bit of introductory text.
Preview it. If the preview looks good, hit "send".

Copy and paste APIs

I think copy and paste is under-rated as an API mechanism.

There are no rate limits or API keys to worry about.

It's supported by almost every application, even ones that are resistant to API integrations.

It even works great on mobile phones, especially if you include a "copy to clipboard" button.

My datasette-copyable plugin for Datasette is one of my earlier explorations of this. It makes it easy to copy data out of Datasette in a variety of useful formats.

This Observable newsletter project has further convinced me that the clipboard is an under-utilized mechanism for building tools to help integrate data together in creative ways.

Tags: blogging, projects, datasette, observable, cors, newsletter, substack