antirez 2506 days ago. 168013 views. The new Redis data structure introduced in Redis 5 under the name of "Streams" generated quite some interest in the community. Soon or later I want to run a community survey, talking with users having production use cases, and blogging about it. Today I want to address another issue: I'm starting to suspect that many users are only thinking at Streams as a way to solve Kafka(TM)-alike use cases. Actually the data structure was designed to *also* work in the context of messaging with producers and consumers, but to think that Redis Streams are just good for that is incredibly reductive. Streaming is a terrific pattern and "mental model" that can be applied when designing systems with great success, but Redis Streams, like most Redis data structures, are more general, and can be used to model dozen of different unrelated problems. So in this blog post I'll focus on Streams as a pure data structure, completely ignoring its blocking operations, consumer groups, and all the messaging parts.

## Streams are CSV files on steroids

If you want to log a series of structured data items and decided that databases are overrated after all, you may say something like: let's just open a file in append only mode, and log every row as a CSV (Comma Separated Value) item:

(open data.csv in append only)
time=1553096724033,cpu_temp=23.4,load=2.3
time=1553096725029,cpu_temp=23.2,load=2.1

Looks simple and people did this for ages and still do: it's a solid pattern if you know what you are doing. But what is the in-memory equivalent of that? Memory is more powerful than an append only file and can automagically remove the limitations of a CSV file like that:

1. It's hard (inefficient) to do range queries here.

2. There is too much redundant information: the time is almost the same in every entry and the fields are duplicated. At the same time removing it will make the format less flexible, if I want to switch to a different set of fields.

3. Item offsets are just the byte offset in the file: if we change the file structure the offset will be wrong, so there is no actual true concept of primary ID here. Entries are basically not univocally addressed in some way.

4. I can't remove entries, but only mark them as no longer valid without the ability of garbage collecting, if not by rewriting the log. Log rewriting usually sucks for several reasons and if it can be avoided, it's good.

Still such log of CSV entries is also great in some way: there is no fixed structure and fields may change, is trivial to generate, and after all is quite compact as well. The idea with Redis Streams was to retain the good things, but go over the limitations. The result is a hybrid data structure very similar to Redis Sorted Sets: they *feel like* a fundamental data structure, but to get such an effect, internally it uses multiple representations.

## Streams 101 (you may skip that if you know already Redis Stream basics)

Redis Streams are represented as delta-compressed macro nodes that are linked together by a radix tree. The effect is to be able to seek to random entries in a very fast way, to obtain ranges if needed, remove old items to create a capped stream, and so forth. Yet our interface to the programmer is very similar to a CSV file:

> XADD mystream * cpu-temp 23.4 load 2.3
"1553097561402-0"
> XADD mystream * cpu-temp 23.2 load 2.1
"1553097568315-0"

As you can see from the example above the XADD command auto generates and returns the entry ID, which is monotonically incrementing and has two parts:
blog comments powered by Disqus
: