False Positives, Real Gains: Valkey’s Probabilistic Filters

By Matthew Boehm

Elevator Pitch

Asking simple questions of your data like “Have I seen this before?” needs to be fast and efficient, without eating up all your memory. That’s where probabilistic data structures really shine. Come learn how Valkey’s Bloom & Cuckoo filters lie to you (in a good way!) to help filter the positives.

Description

In the world of large-scale data, asking simple questions like “Have I seen this before?” needs to be fast and efficient, without eating up all your memory. That’s where probabilistic data structures like Bloom filters and Cuckoo filters come in.

In this talk, we’ll explore how Valkey, high-performance, open-source fork of Redis, uses Bloom, and Cuckoo filters to help developers tackle membership tests without large footprints. We will explore the fundamentals of what these filters are, how they work, what makes them memory-efficient, why they occasionally lie, and when to choose one over the other.

We’ll cover some real-world use cases, practical tips for implementing, and tuning these filters in Valkey. Whether you’re building a caching layer, or fighting duplicate data, you’ll leave with a clear understanding of how to get the most out of these clever data structures.

Notes

As Principal Architect at Percona, one of the leading backing companies behind Valkey, my colleagues and I are pushing more customers to use Valkey in innovative ways to enhance their traditional RDBMS, in both scale and performance. I was originally going to write this topic up as a simple blog post, however a 45m talk would allow for much more information to be shared. My direct experience with customers in integrating Valkey into their database architecture makes me very suited to deliver on such a topic.