Daily Shaarli

All links of one day in a single page.

Today - May 9, 2025

Reservoir Sampling

Reservoir sampling is a technique for selecting a fair random sample when you don't know the size of the set you're sampling from.

An algorithm like this is applicable in cases like rate limiting sending a fixed number of logs per time interval to protect against bursting. A naive greedy approach will result in biases. We want to evenly sample the logs.

The trick is to use a fixed K slots of log entries (K corresponds to the per-bucket rate limit). For upcoming entries, first fill the slots (N<=K), for an upcoming entry after N>K, we decide to replace a slot with the new entry at the probability of K/N. The evicted slot is chosen randomly from the K slots. In the end of the time interval, flush the slots.