Reflections on AI at the end of 2025
V* Chain of thought is now a fundamental way to improve LLM output. But, what is CoT? Why it improves output? I believe it is two things: 1. Sampling in the model representations (that is, a form of internal search). After information and concepts relevant to the prompt topic is in the context window, the model can better reply. 2. But if you mix this to reinforcement learning, the model also learns to put one token after the other (each token will change the model state) in order to converge to some useful reply.
Scaling HNSWs
VAI is different
VSince LLMs and in general deep models are poorly understood, and even the most prominent experts in the field failed miserably again and again to modulate the expectations (with incredible errors on both sides: of reducing or magnifying what was near to come), it is hard to tell what will come next. But even before the Transformer architecture, we were seeing incredible progress for many years, and so far there is no clear sign that the future will not hold more. After all, a plateau of the current systems is possible and very credible, but it would likely stimulate, at this point, massive research efforts in the next step of architectures.
Coding with LLMs in the summer of 2025 (an update)
V1. Eliminating bugs you introduced in your code before it ever hits any user: I experienced this with Vector Sets implementation of Redis. I would end eliminating all the bugs eventually, but many were just removed immediately by Gemini / Claude code reviews.
Human coders are still better than LLMs
VWhat I learned during the license switch
VThen at 6PM I was at home to release my blog post about the AGPL license switch, and I started following the comments, feedbacks, private messages, and I learned a few things in the process.
Redis is open source again
VI tried to give more strength to the ongoing pro-AGPL license side. My feeling was that the SSPL, in practical terms, failed to be accepted by the community. The OSI wouldn't accept it, nor would the software community regard the SSPL as an open license. In little time, I saw the hypothesis getting more and more traction, at all levels within the company hierarchy.
Reproducing Hacker News writing style fingerprinting
VThis is the original post: https://news.ycombinator.com/item?id=33755016
I was not aware, back then, of Burrows-Delta method for style detection: it seemed kinda magical that you just needed to normalize a frequency vector of top words to reach such quite remarkable results. I read a few wikipedia pages and took mental note of it. Then, as I was working with Vectors for Redis I remembered about this post, searched the web only to discover that the original page was gone and that the author, in the original post and website, didn't really explained very well how the data was processed, the top words extracted (and, especially, how many were used) and so forth. I thought I could reproduce the work with Vector Sets, once I was done with the main work. Now the new data type is in the release candidate, and I found some time to work on the problem. This is a report of what I did, but before to continue, the mandatory demo site: you can play with it at the following link:
Vector Sets are part of Redis
Vhttps://github.com/redis/redis/blob/unstable/modules/vector-sets/README.md
The goal of the new data structure is, in short, to create a new "Set alike" data type, similar to Sorted Sets, where instead of having a scalar as a score, you have a vector, and you can add and remove elements the Redis way, without caring about anything except the properties of the abstract data structure Redis implements, ask for elements similar to a given query vector (or a vector associated to some element already in the set), and so forth. But more about that later, a bit of background, first:
AI is useless, but it is our best bet for the future
VJust five minutes ago, I was writing a piece of software and relied on AI for assistance. Yet, here I am, starting this blog post by telling you that artificial intelligence, so far, has proven somewhat useless. How can I make such a statement if AI was just so helpful a moment ago? Actually, there's no contradiction here if we clarify exactly what we mean.
Here's the thing: at this very moment, artificial intelligence can support me significantly. If I'm struggling with complicated code or need to understand an advanced scientific paper on math, I can turn to AI for clarity. It can help me generate an image for a project, make a translation, clean my YouTube transcript. Clearly, it's practical and beneficial in these everyday tasks.
Big LLMs weights are a piece of history
VReasoning models are just LLMs
VFirst, DeepSeek R1 (don't want to talk about o1 / o3, since it's a private thing we don't have access to, but it's very likely the same) is a pure decoder only autoregressive model. It's the same next token prediction that was so strongly criticized. There isn't, in any place of the model, any explicit symbolic reasoning or representation.
We are destroying software
VWe are destroying software with complex build systems.
We are destroying software with an absurd chain of dependencies, making everything bloated and fragile.
We are destroying software telling new programmers: "Don't reinvent the wheel!". But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels.
We are destroying software by no longer caring about backward APIs compatibility.
From where I left
VMy detachment was not the result of me hating my past work. While in the long run my creative work was less and less important and the "handling the project" activities became more and more substantial -- a shift that many programmers are able to do, but that's not my bread and butter -- well, I still enjoyed doing Redis stuff when I left. However, I don't share the vision that most people at my age (I'm 47 now) have: that they are still young. I wanted to do new stuff, especially writing. I wanted to stay more with my family and help my relatives. I definitely needed a break.
Playing audio files in a Pi Pico without a DAC
VFirst Token Cutoff LLM sampling
VToday one of the most used ones, more or less the default, is called top-p: it is a form of nucleus sampling where top-scoring tokens are collected up to a total probability sum of "p", then random weighted sampling is performed.
Translating blog posts with GPT-4, or: on hope and fear
V1. Think about what I want to say for weeks or months. No, I don't spend weeks focusing on a blog post, the process is exactly reversed: I write blog posts about things that are so important to me to be in my mind for weeks.
2. Then, once enough ideas collapsed together in a decent form, I write the blog post in 30 minutes, often without caring much about the form, and I hit "publish". This process usually works writing the titles of the sections as I initially just got the big picture of what I want to say, and then filling the empty paragraphs with text.
LLMs and Programming in the first days of 2024
VThe origins of the Idle Scan
VIn defense of linked lists
VScrivendo Wohpe
VDopo due anni di lavoro, finalmente, Wohpe, il mio primo libro di fantascienza, ma anche il mio primo scritto di prosa di questa lunghezza, e uscito nelle librerie fisiche italiane, su Amazon, e negli altri store digitali. Lo trovate qui: https://www.amazon.it/Wohpe-Salvatore-Sanfilippo/dp/B09XT6J3WX
Dicevo: il primo scritto di questa lunghezza. Ma posso considerarmi del tutto nuovo alla scrittura? Ho scritto per vent'anni in questo blog e in quelli passati che ho tenuto nel corso del tempo, e molto spesso ho usato Facebook per scrivere brevi racconti, frutto di fantasie o basati su fatti reali. Oltre a cio, ho scritto di cose tecniche, specialmente riguardo la programmazione, per un tempo altrettanto lungo, e sono stato un lettore di racconti e di romanzi per tutto il corso della mia vita. E allora perche scrivere Wohpe e stato anche imparare a scrivere da zero?
Writing Wohpe
V[Sorry for the form of this post. For the first time I wrote a post in two languages: Italian and English. So I went for the unusual path of writing it in Italian to start, translating it with Google Translate, and later I just scanned it to fix the biggest issues. At this point GT is so good you can get away with this process.]
After two years of work, finally, Wohpe, my first science fiction book, but also my first prose writing of this length, has been released in Italian physical bookstores, on Amazon, and in other digital stores. You can find it here: https://www.amazon.it/Wohpe-Salvatore-Sanfilippo/dp/B09XT6J3WX
Programming and Writing
VThe most obvious parallel between the two activities is that in both of them you write something. Code is not prose written in a natural language, yet it has a set of fixed rules (a grammar), certain forms that most programmers will understand as natural and others that, while formally correct, will sound hard to grasp.
The open source paradox
VAs somebody said, the best code is written when you are supposed to do something else [1]. Like a writer will do her best when writing that novel that, maybe, nobody will pay a single cent for, and not when doing copywriting work for a well known company, programmers are likely to spend more energies in their open source side projects than during office hours, while writing another piece of a project they feel stupid, boring, pointless. And, if the company is big enough, chances are it will be cancelled in six months anyway or retired one year after the big launch.
The end of the Redis adventure
VRedis 6.0.0 GA is out!
VSo the big news are the ones announced before, but with some notable changes. The old stuff are: SSL, ACLs, RESP3, Client side caching, Threaded I/O, Diskless replication on replicas, Cluster support in Redis-benchmark and improved redis-cli cluster support, Disque in beta as a module of Redis, and the Redis Cluster Proxy (now at https://github.com/RedisLabs/redis-cluster-proxy).
Redis 6 RC1 is out today
VClient side caching in Redis 6
VThe New York Redis day was over, I get up at the hotel at 5:30, still pretty in sync with the Italian time zone and immediately went walking on the streets of Manhattan, completely in love with the landscape and the wonderful feeling of being just a number among millions of other numbers. Yet I was thinking at the Redis 6 release with the feeling that, what was probably the most important feature at all, the new version of the Redis protocol (RESP3), was going to have a very slow adoption curve, and for good reasons: wise people avoid switching tools without very good reasons. After all why I wanted to improve the protocol so badly? For two reasons mainly, to provide clients with more semantical replies, and in order to open to new features that were hard to implement with the old protocol; one feature in particular was the most important to me: client side caching.
The struggles of an open source maintainer
VRedis streams as a pure data structure
VGopher: a present for Redis
VAn update about Redis developments in 2019
V-- https://news.ycombinator.com/item?id=19204436 --
I love Redis, but I'm a bit skeptical of some of the changes that are currently in development. The respv3 protocol has some features that, while they sound neat, also could significantly complicate client library code. There's also a lot of work going into a granular acl. I can't imagine why this would be necessary, or a higher priority than other changes like multi-thread support, better persistence model, data-types, etc.
Why RESP3 will be the only protocol supported by Redis 6
Vfrom Stack Overflow suggested that we could just switch protocol for backward compatibility per-connection, sending a command to enable RESP3. That means no longer need for a global configuration that switches the behavior of the server. Put in that way it is a lot more acceptable for me, and I'm reconsidering the essence of the blog post]
A few weeks after the release of Redis 5, I'm here starting to implement RESP3, and after a few days of work it feels very well to see this finally happening. RESP3 is the new client-server protocol that Redis will use starting from Redis 6. The specification at https://github.com/antirez/resp3 should explain in clear terms how this evolution of our old protocol, RESP2, should improve the Redis ecosystem. But let's say that the most important thing is that RESP3 is more "semantic" than RESP2. For instance it has the concept of maps, sets (unordered lists of elements), attributes of the returned data, that may augment the reply with auxiliary information, and so forth. The final goal is to make new Redis clients have less work to do for us, that is, just deciding a set of fixed rules in order to convert every reply type from RESP3 to a given appropriate type of the client library programming language.
Writing system software: code comments.
VLOLWUT: a piece of art inside a database command
VAs I was changing the Redis source code to get rid of a specific word where possible, I started to think that whatever my idea was about the work I was doing, I'm the kind of person that enjoys writing code that has no measurable technological effects. Replacing words is just annoying, even if, even there, there were a few worthwhile technological challenges. But there is some other kind of code that I believe has a quality called "hack value". It may not solve any technological problem, yet it's worth to write. Sometimes because the process of writing the code is, itself, rewarding. Other times because very technically advanced ideas are used to solve a not useful problem. Sometimes code is just written for artistic reasons.
On Redis master-slave terminology
VI said that I was sorry he was disappointed about that, but at the same time, I don't believe that terminology out of context is offensive, so if I use master-slave in the context of databases, and I'm not referring in any way to slavery. I originally copied the terms from MySQL, and now they are the way we call things in Redis, and since I do not believe in this battle (I'll tell you later why), to change the documentation, deprecate the API and add a new one, change the INFO fields, just to make a subset of people that care about those things more happy, do not make sense to me.
Redis is not "open core"
VThe simplification this time does not work if it is in your interest to capture the truth of what is happening here. An open core technology requires two things. One is that the system is modular, and the other is that parts of such system are made proprietary in order to create a product around an otherwise free software. For example providing a single node of a database into the open source, and then having the clustering logic and mechanism implemented in a different non-free layer, is an open core technology. Similarly is open core if I write a relational database with a modular storage system, but the only storage that is able to provide strong guarantees is non free. In an open core business model around an open source system it is *fundamental* that you take something useful out of the free software part.
Redis will remain BSD licensed
VRedis Lua scripting: several security vulnerabilities fixed
VClarifications on the Incapsula Redis security report
V[1] https://www.incapsula.com/blog/report-75-of-open-redis-servers-are-infected.html
Many folks don't need any clarification about all this, because if you have some grip on computer security and how Redis works, you can contextualize all this without much efforts. However I'm writing this blog post for two reasons. The obvious one is that it can help the press and other users that are not much into security and/or Redis to understand what's going on. The second is that the exposed Redis instances are a case study about safe defaults that should be interesting for the security circles.
A short tale of a read overflow
VWhen a long running process crashes, it is pretty uncool. More so if the process happens to take a lot of state in memory. This is why I love web programming frameworks that are able, without major performance overhead, to create a new interpreter and a new state for each page view, and deallocate every resource used at the end of the page generation. It is an inherently more reliable programming paradigm, where memory leaks, descriptor leaks, and even random crashes from time to time do not constitute a serious issue. However system software like Redis is at the other side of the spectrum, a side populated by things that should never crash.
An update on Redis Streams development
VTo start, in this moment Streams are my main priority: I want to finish this work that I believe is very useful in the Redis community and immediately start with the Redis Cluster improvements plans. Actually the work on Cluster has already started, with my colleague Fabio Nicotra that is porting redis-trib, the Cluster management tool, inside the old and good redis-cli. This step involves translating the code from Ruby to C. In the meantime, a few weeks ago I finished writing the Streams core, and I deleted the "streams" feature branch, merging everything into the "unstable" branch.
Redis PSYNC2 bug post mortem
VStreams: a new general purpose data structure in Redis.
VDoing the FizzleFade effect using a Feistel network
VYou may wonder why the original code used a LFSR or why I'm proposing a different approach, instead of the vanilla setPixel(rand(),rand()): doing this with a pseudo random generator, as noted in the blog post, is slow, but is also visually very unpleasant, since the more red pixels you have on the screen already, the less likely is that you hit a new yet-not-red pixel, so the final pixels take forever to turn red (I *bet* that many readers of this blog post tried it in the old times of the Spectum, C64, or later with QBASIC or GWBasic). In the final part of the blog post the author writes:
The mythical 10x programmer
VThe programming community is extremely polarized about the existence or not of such a beast: who says there is no such a thing as the 10x programmer, who says it actually does not just exist, but there are even 100x programmers if you know where to look for.
Redis on the Raspberry Pi: adventures in unaligned lands
VThe first release candidate of Redis 4.0 is out
VIt's just that Redis 4.0 has a lot of things that Redis should have had since ages, in a different world where one developer can, like Ken The Warrior, duplicate itself in ten copies and start to code. But it does not matter how hard I try to learn about new vim shortcuts, still the duplicate-me thing is not in my chords.
Random notes on improving the Redis LRU algorithm
VIn other terms every cache has an hits/misses ratio, which is, in qualitative terms, just the percentage of read queries that the cache is able to serve. Accesses to the keys of a cache are not distributed evenly among the data set in most workloads. Often a small percentage of keys get a very large percentage of all the accesses. Moreover the access pattern often changes over time, which means that as time passes certain keys that were very requested may no longer be accessed often, and conversely, keys that once were not popular may turn into the most accessed keys.
Writing an editor in less than 1000 lines of code, just for fun
VScreencast here: https://asciinema.org/a/90r2i9bq8po03nazhqtsifksb
For the sentimentalists, keep reading...
A couple weeks ago there was this news about the Nano editor no longer being part of the GNU project. My first reaction was, wow people still really care about an old editor which is a clone of an editor originally part of a terminal based EMAIL CLIENT. Let's say this again, "email client". The notion of email client itself is gone at this point, everything changed. And yet I read, on Hacker News, a number of people writing how they were often saved by the availability of nano on random systems, doing system administrator tasks, for example. Nano is also how my son wrote his first program in C. It's an acceptable experience that does not require past experience editing files.
Programmers are not different, they need simple UIs.
VI really mean it when I say *days*, just for the API. Writing drafts, starting the implementation shaping data structures and calls, and then restarting from scratch to iterate again in a better way, to improve the design and the user facing part.
Why I do that, delaying features for weeks? Is it really so important?
Programmers are engineers, maybe they should just adapt to whatever API is better to export for the system exporting it.
Redis Loadable Modules System
VModules can be the most interesting feature of a system and the most problematic one at the same
time: API incompatibilities between versions, low quality modules crashing the system, a lack
Three ideas about text messages
VI don't mind staying disconnected for some time usually. It's a good time to focus, write some code, or a blog post like this one. However when I'm disconnected, what makes the most difference is not Facebook or Twitter or Github, but the lack of text messages.
At this point text messages are a fundamental thing in my life. They are also probably the main source of distraction. I use messages to talk with my family, even just to communicate between different floors. I use messages with friends to organize dinners and vacations. I even use messages with the plumber or the doctor.
Redis 3.2.0 is out!
V* The GEO API. Index whatever you want by latitude and longitude, and query by radius, with the same speed and easy of use of the other Redis data structures. Here you can find the API documentation: http://redis.io/commands/#geo. Thank you to Matt Stancliff for the initial implementation, that was reworked but is still at the core of the GEO API, and to the developers of ARDB for providing the geo indexing code that Matt used.
100 more of those BITFIELDs
VThe essence of this command is not new, it was proposed in the past by me and others, but never in a serious way, the idea always looked a bit strange. We already have bit operations in Redis: certain users love it, it's a good way to represent a lot of data in a compact way. However so far we handle each bit separately, setting, testing, getting bits, counting all the bits that are set in a range, and so forth.
The binary search of distributed programming
VThis apparently simple problem can be more complex than it looks at a first glance, considering that it must ensure that, in all the conditions, there is a safety property which is always guaranteed: the ID generated is always greater than all the past IDs generated, and the same ID cannot be generated multiple times. This must hold during network partitions and other failures. The system may just become unavailable if there are less than the majority of nodes that can be reached, but never provide the wrong answer (note: as we'll see this algorithm has another liveness issue that happens during high load of requests).
Is Redlock safe?
VRedlock is a client side distributed locking algorithm I designed to be used with Redis, but the algorithm orchestrates, client side, a set of nodes that implement a data store with certain capabilities, in order to create a multi-master fault tolerant, and hopefully safe, distributed lock with auto release capabilities.
Disque 1.0 RC1 is out!
VIf you don't know what Disque is, the best starting point is to read the README in the Github project page at http://github.com/antirez/disque.
Disque is a just piece of software, so it has a material value which can be zero or more, depending on its ability to make useful things for people using it. But for me there is an huge value that goes over what Disque, materially, is. It is the value of designing and doing something you care about. It's the magic of programming: where there was nothing, now there is something that works, that other people may potentially analyze, run, use.
Generating unique IDs: an easy and reliable way
VThe post was one of the top news on Hacker News today. It's pretty clear and informative from the point of view of how Math.random() is broken and how should be fixed, so I've nothing to add to the matter itself. But since the author discovered the weakness of the PRNG in the context of generating large probably-non-colliding IDs, I want to share with you an alternative that I used multiple times in the past, which is fast and extremely reliable.
6 years of commit visualized
VFull size image here: http://antirez.com/misc/commitsvis.png
Each commit is a rectangle. The height is the number of affected lines (a logarithmic scale is used). The gray labels show release tags.
There are little surprises since the amount of commit remained pretty much the same over the time, however now that we no longer backport features back into 3.0 and future releases, the rate at which new patchlevel versions are released diminished.
Recent improvements to Redis Lua scripting
V1. A proper debugger for Redis Lua scripts.
2. Replication, and storage on the AOF, of Lua scripts as a set of write commands materializing the *effects* of the script, instead of replicating the script itself as we normally do.
A few things about Redis security
VFrom time to time I get security reports about Redis. It's good to get reports, but it's odd that what I get is usually about things like Lua sandbox escaping, insecure temporary file creation, and similar issues, in a software which is designed (as we explain in our security page here http://redis.io/topics/security) to be totally insecure if exposed to the outside world.
Moving the Redis community on Reddit
VThis looks like a crazy ideas probably in some way, and "to move" is probably not the right verb, since the ML will still exist. However it will only be used in order to receive announcements of new releases, critical informations like security related ones, and from time to time, links to very important discussions that are happening on Reddit.
Clarifications about Redis and Memcached
VHowever it is also true that in order to pick the right solution users must be correctly informed.
This post was triggered by reading a blog post published by Mike Perham, that you may know as the author of a popular library called Sidekiq, that happens to use Redis as backend. So I would not consider Mike a person which is "against" Redis at all. Yet in his blog post that you can find at the URL http://www.mikeperham.com/2015/09/24/storing-data-with-redis/ he states that, for caching, "you should probably use Memcached instead [of Redis]". So Mike simply really believes Redis is not good for caching, and he arguments his thesis in this way:
Lazy Redis is better Redis
VHowever some time ago I opened an issue where I promised a new Redis feature that many wanted, me included, called "lazy free". The original issue is here: https://github.com/antirez/redis/issues/1748.
About Redis Sets memory efficiency
VOn Hacker News people asked why not using Redis instead: https://news.ycombinator.com/item?id=10118413
Amplitude developers have their set of reasons for not using Redis, and in general if you have a very specific problem and want to scale it in the best possible way, it makes sense to implement your vertical solution. I'm not adverse to reinventing the wheel, you want your very specific wheel sometimes, that a general purpose system may not be able to provide. Moreover creating your solution gives you control on what you did, boosts your creativity and your confidence in what you, as a developer can do, makes you able to debug whatever bug may arise in the future without external help.
Thanks Pivotal, Hello Redis Labs
VCommit messages are not titles
VPlans for Redis 3.2
VAdventures in message queues
VIt is a few months that I spend ~ 15-20% of my time, mostly hours stolen to nights and weekends, working to a new system. It's a message broker and it's called Disque. I've an implementation of 80% of what was in the original specification, but still I don't feel like it's ready to be released. Since I can't ship, I'll at least blog... so that's the story of how it started and a few details about what it is.
Redis Conference 2015
VSide projects
VHow did I stopped doing new things to focus into an unique effort, drastically monopolizing my professional life? It was a too big sacrifice to do, for an human being with a limited life span. Fortunately I simply never did this, I never stopped doing new things.
Why we don't have benchmarks comparing Redis with other DBs
VRedis latency spikes and the Linux kernel: a few more details
VThe test was performed with a 1GB of data in memory, with 150k writes per second originating from a different EC2 instance, targeting 5 million keys (evenly distributed). The pipeline was set to 4 commands. This translates to the following command line of redis-benchmark:
Redis latency spikes and the 99th percentile
VThis is why I can't have conversations using Twitter
VDiskless replication: a few design notes.
VThe feature is not exactly a new idea, it was proposed several times, especially by EC2 users that know that sometimes it is not trivial for a master to provide good performances during slaves synchronization. However there are a number of use cases where you don't want to touch disks, even running on physical servers, and especially when Redis is used as a cache. Redis replication was, in short, forcing users to use disk even when they don't need or want disk durability.
A few arguments about Redis Sentinel properties and fail scenarios.
V"OH on Redis Sentinel "They kill -9'd the master, which caused a split brain..."
"then the old master popped up with no data and replicated the lack of data to all the other nodes. Literally had to restore from backups."
OMG we have some nasty bug I thought. However I tried to get more information from Kyle, and he replied that the users actually disabled disk persistence at all from the master process. Yep: the master was configured on purpose to restart with a wiped data set.
Redis cluster, no longer vaporware.
VBasically it is a roughly 4 years old project. This is about two thirds the whole history of the Redis project. Yet, it is only today, that I'm releasing a Release Candidate, the first one, of Redis 3.0.0, which is the first version with Cluster support.
Queues and databases
VA proposal for more reliable locks using Redis
VUPDATE: The algorithm is now described in the Redis documentation here => http://redis.io/topics/distlock. The article is left here in its older version, the updates will go into the Redis documentation instead.
-----------------
Many people use Redis to implement distributed locks. Many believe that this is a great use case, and that Redis worked great to solve an otherwise hard to solve problem. Others believe that this is totally broken, unsafe, and wrong use case for Redis.
Using Heartbleed as a starting point
VRedis new data structure: the HyperLogLog
VCounting unique things
===
Usually counting unique things, for example the number of unique IPs that connected today to your web site, or the number of unique searches that your users performed, requires to remember all the unique elements encountered so far, in order to match the next element with the set of already seen elements, and increment a counter only if the new element was never seen before.
Fascinating little programs
VI was trying to merge a few pull requests, to fix issues, and doing some refactoring at the same time. It was some kind of nirvana I was feeling: a complete control of small, self-contained, and useful code.
There is something special in simple code. Here I'm not referring to simplicity to fight complexity or over engineering, but to simplicity per se, auto referential, without goals if not beauty, understandability and elegance.
What is performance?
VA good starting point is probably the first slide I use lately in my talks about Redis. This first slide is indeed about performance, and says that performance is mainly three different things.
Happy birthday Redis!
VI'm a bit shocked I worked for five years straight to the same thing. The opportunities for learning new things I had because of the directions where Redis pushed me, and the opportunities to learn new things that I missed because I had almost consistently no time for random hacking, are huge.
A simple distributed algorithm for small idempotent information
VThe algorithm is useful when you need to take some kind of information synchronized among a number of processes.
The information can be everything as long as it is composed of a small number of bytes, and as long as it is idempotent, that is, the current value of the information does not depend on the previous value, and we can just replace an old value, with the new one.
Redis Cluster and limiting divergences.
VSome fun with Redis Cluster testing
VIn order to perform some testing I assembled an environment like that:
* Hardware: 6 real computers: 2 macbook pro, 2 macbook air, 1 Linux desktop, 1 Linux tiny laptop called EEEpc running with a single core at 800Mhz.
Redis as AP system, reloaded
VAt the end of the work day I was reading about Redis as AP and merge operations on Twitter. At the same time I was having a private email exchange with Alexis Richardson (from RabbitMQ, and, my boss). Alexis at some point proposed that perhaps a way to improve safety was to asynchronously ACK the client about what commands actually were not received so that the client could retry. This seemed a lot of efforts in the client side, but somewhat totally opened my view on the matter.
The Redis criticism thread
Vhttps://groups.google.com/forum/#!topic/redis-db/Oazt2k7Lzz4
The thread has reached 89 posts so far, probably one of the biggest threads in the history of the Redis google group.
The main idea was that critiques are a mix of pointless attacks, and truth, so to extract the truth from critiques can be a good exercise, it means to have some seed idea for future improvements from the part of the population that is not using or is not happy with your system.
WAIT: synchronous replication for Redis
VThe feature was extremely easy to implement because of previous work made. WAIT was basically a direct consequence of the new Redis replication design (that started with Redis 2.8). The feature itself is in a form that respects the design of Redis, so it is relatively different from other implementations of synchronous replication, both at API level, and from the point of view of the degree of consistency it is able to ensure.
Blog lost and recovered in 30 minutes
VI just started a screen instance, and run something like ./redis-server --port 10000. Since this is equivalent to an empty config file with just "port 10000" inside I was running no disk backed at all.
Since Redis very rarely crashes, guess what, after more than one year it was still running inside the screen session, and I totally forgot that it was running like that, happily writing controversial posts in my blog. Yesterday my server was under attack. This caused an higher then normal load, and Linode rebooted the instance. As a result my blog was gone.
The fight against sexism is not a free pass
VBasically the developer Ben Noordhuis rejected a pull request involving a change in the documentation to use gender-neutral form instead of "him". Joyent replied with this incredible post: http://www.joyent.com/blog/the-power-of-a-pronoun.
In the blog post you can read:
"But while Isaac is a Joyent employee, Ben is not--and if he had been, he wouldn't be as of this morning: to reject a pull request that eliminates a gendered pronoun on the principle that pronouns should in fact be gendered would constitute a fireable offense for me and for Joyent."
Finally Redis collections are iterable
VIt is limited because it only allows to access data in a natural way, that is, in a data structure obvious way. Sorted sets are easy to access by score ranges, while hashes by field name, and so forth.
This API "way" has profound effects on what Redis is and how users organize data into it, because an API that is data-obvious means fast operations, less code and less bugs in the implementation, but especially forcing the application layer to make meaningful choices: the database as a system in which you are responsible of organizing data in a way that makes sense in your application, versus a database as a magical object where you put data inside, and then it will be able to fetch and organize data for you in any format.
New Redis Cluster meta-data handling
VLet's start with the problem to solve. Redis Cluster uses a master - slave design in order to recover from nodes failures. The key space is partitioned across the different masters in the cluster, using a concept that we call "hash slots". Basically every key is hashed into a number between 0 and 16383. If a given key hashes to 15, it means it is in the hash slot number 15. These 16k hash slots are split among the different masters.
English has been my pain for 15 years
VTwilio incident and Redis
Vhttp://www.twilio.com/blog/2013/07/billing-incident-post-mortem.html
The problem was about a Redis server, since Twilio is using Redis to store the in-flight account balances, in a master-slaves setup, with multiple slaves in different data centers for obvious availability and data safety concerns.
This is a short analysis of the incident, what Twilio can do and what Redis can do to avoid this kind of issues.
San Francisco
VReaching San Francisco
===
If you want to reach San Francisco from Sicily, there are no direct flights helping you. My flight was a Lufthansa flight from Catania to Munich, and finally from Munich to San Francisco. This is a total of 15 hours flight, plus the stop in Munich waiting for the second flight.