871 private links
Reservoir sampling is a technique for selecting a fair random sample when you don't know the size of the set you're sampling from.
An algorithm like this is applicable in cases like rate limiting sending a fixed number of logs per time interval to protect against bursting. A naive greedy approach will result in biases. We want to evenly sample the logs.
The trick is to use a fixed K slots of log entries (K corresponds to the per-bucket rate limit). For upcoming entries, first fill the slots (N<=K), for an upcoming entry after N>K, we decide to replace a slot with the new entry at the probability of K/N. The evicted slot is chosen randomly from the K slots. In the end of the time interval, flush the slots.
Clipper is a network debugging tool that intercepts TLS traffic to allow the traffic to be viewed on Chrome Dev Tools. What interests me the most is how it decrypt TLS traffic.
There are several ways to do that that I know of: The first method is with environment variable SSLKEYLOGFILE
; tools that respect the environment can dump the keys to the specified file, which can picked up by tools like Wireshark. The problem is that many tools doesn't respect the variable out of box. The second is MITM the traffic with a self-signed certificate. This method doesn't work with TLS key pinning and does not truly reflect the traffic due to the proxy layer.
Clipper instead used the trick to LD_PRELOAD a library that uses Frida to hook library functions (e.g. OpenSSL) to extract the keys, and implement a universal SSLKEYLOGFILE
support.
TIL the author of SQLite wrote a tool called sqlite3_rsync
that can be used to copy one database across machines. It does so incrementally like rsync while preserving the integrity of the database so both the source and the replica databases are safe to use when copying.
An encyclopedia of consistency models for concurrent systems.
The algorithm for de-pixelating a pixelated moving window with fixed content:
- extract the frames and find the absolute positions of the window for each frame
- assume the color of each pixelated cell is the color for the center pixel of each cell (think of pixelation as a magnifier filter focusing on the center pixel for each cell)
- by tracking the locations of the centered pixels and its colors in each frame, we fill in more and more pixels with known colors.
- for the remaining unfillled pixels, interpolate from nearest pixel of known colors (e.g. voronoi)
Analyze the bytes in a SQL database.
A small hash function reconstruct a ultra-blurred image from a 20-bit integer, used for image preview. The integer is bit-decomposed into nine values, six brightness values (0-3) at the 2x3 grid-points of the image, three other values to encode a color for the background (oklab). Then the blurred image is rendered by drawing six radial gradient at the points fading into transparency, then a solid background.
A non-interactive zero-knowledge proof scheme: The prover commits (shuffle+lock) the graph N times: s[0..N]
. He then calculate r=hash(s[0] || s[1] || .. || s[N])
. And then he use the random bits in r
generate N
pseduorandom numbers c[0..N]
. For each i=0..N
, use c[i]
to select a fixed edge in s[i]
and reveal the keys k[i][0]
and k[i][1]
for the two nodes of the edge. The final message consists of s[0..N]
and k[0..N][0..1]
. The verifier is able to use these information to replay the steps to verify the prover's claim.
Another thing I learned from the video is how to "lock" messages cryptographically and allow the verifier to verify any message without revealing the others. First, the prover calculate h[i] = hash(m[i] || nounce[i])
and send h[0..N]
to the verifier. The verify choose any integer t
between 0
and N
and send to prover. The prover will then send back (m[t], nounce[t])
to the verifier that h[t]
is indeed calculated from m[t]
. Here h[t]
is the locked message m[t]
and nounce[t]
is the key.
Various IP lists for download - useful for setting up firewall against attackers, malware, p2p, crawlers etc.
I'd like to note the AirTag privacy-preserving tracking scheme I learned:
- the tag shares its key pair with its owning host as part of the pairing process
- the tag broadcasts its public key when away from its host
- nearby phones pick up the broadcast and send a location report to the apple server, which includes:
- the phone's gps location encrypted using the tag's public key
- the hashed public key of the tag
- the apple server store the location reports for each hashed public key
- the host can request the apple server to download location report for the tag using a public key. then it decrypts the gps location using the private key it got during pairing
TIL succinct data structures - data stored in a compact fashion as in a compression, but the compact form itself has useful properties.
The Rank/Select Bit Vector is a bit vector equipped with two fast operations - rank and select. The rank operator returns the number of set bits up to the n-th bit. The select operator returns the index of the n-th set bit, i.e. inverse of rank. The data structure has nice uses as in string lookup, etc.
The Wavelet Matrix is a generalization of rank/select bit vector on non-binary symbols. Another data structure of the class is the FM-index, whose operators extend to working on patterns (substrings) instead of symbols.
How can a calculator program perform an exact calculation involving arithmetic operations, roots, trigonometric functions, etc? It's a difficult task. Here is how it can be done.
- rational type: bignum for both nom and denom
- recursive real arithmetic (RRA):
- represent a real number as a computation that maps from a precision value (a rational) to the rational result etc.
- for composite expression, compute the precision value needed for each of its components (e.g. for 2pi to be shown with a precision of 0.01, pi needs to be calculated to 0.005 precision)
- determine the exact equality between RRA
- undecidable but decidable for a finite set of ops (proven by Dan Richardson, 1994), but slow
- the optimization
- represent every number as an exact rational times a RRA. (e.g. (3/2)pi)
- hard code the common types of RRA (sin, log, ..) whose identity can be checked directly.
- for uncommon types of RRA, just store RRA (e.g. pi * sqrt(2))
the result: the digits shown on screen are always 100% correct. 99% of calculation gets perfect UX.
A site dedicated to Geometric Algebra courses, libraries, discussion, etc.
My notes on Geometric Algebra: https://shaarli.lain.li/shaare/OGClBA
A blog dedicated in explaining obscure and seemingly-magical code snippets. Like the submissions to IOCCC contest.
The "Wats" for Nix language. Also see https://md.darmstadt.ccc.de/xtNP7JuIQ5iNW1FjuhUccw.
Linux audio streams are supposed to be suspended on idle to allow the hardware sink to power-off for power efficiency. However, Firefox expect the callers of AudioContext to explicitly suspend the context, which most developers don't bother. The author developed an extension to automate this. Now it explains why Firefox always shows up in pavucontrol's "playback" tab. TIL powerstat
.
The author studies the reproducibility of the ~100k packages on nixpkgs and found an impressive 91% bitwise reproducibility rate in in the most recent release of nixpkgs. The post discussed the main reasons for unreproducibilities: embedded dates, embedded uname outputs, embedded environment variables, embedded non-deterministic build ids.
Treat an A/B test as an multi-armed bandit RL problem. First we need to record the performance for each option. Then every time we are faced with a decision, calculate the reward for each option, pick the best one. Leave a small chance (e.g. 10%) for exploration where a random option is selected.
Here's how I did it: with the example images on screen (~10cm wide), stare at it from around 25cm away. Cross your eyes until the images overlap. To cross your eyes you should keep the distance to the screen around the same but actively tries to move the focus. Do so until the image overlaps and merge. Now adjust the distance to the screen very slowly while keeping the focus so the merged image don't split, until the merged image become sharp. The merged image may never become as sharp as viewed directly, but that's enough for us. Now notice the spots where the image appears semi-transparent/flickering/bloty. You may not be able to tell in detail what the differences exactly are, but you should be able to tell the relative locations. Note that depending on the focusing, nearby spots may merge as one. Also I found the "hard mode" challenge easier than the "easy mode" in the sense that the difference stands out more clearly.
TIL VoxelSpace, a simple algorithm for rendering terrain map in high detail. The terrain is described by two identical sized textures: the height map and the color map. The color map is preshaded to contain shadows/other features. The rendering is performed from back to front at the viewer's perspective (painter's algorithm). For each depth, calculate a line corresponding to the given depth, use the line to sample textures on the height map and the color map. For each rasterized point of such line, render a vertical line into the view port from the given height to 0 with the given color. Repeat this process until the depth reaches the near z-plane. The algorithm can be optimized by drawing from front to back to avoid drawing unnecessary vertical line segments at the cost of an extra z-buffer.