871 private links
Three math puzzles that have surprisingly simple solutions once the symmetry in perspective is recognized.
Optimizing SQLite indexes:
- prefer composite index over multiple single-column indexes.
- index column order: put more selective ones up front
- how sqlite planner choose to use indexed columns: left-to-right, no skipping, stop at the first range
- left-to-right, no skipping: sqlite cannot use first and third index column alone, so if the query has no mention of the second index, it will become a scan after the first column.
- stop at the first range: the planner only use composite index up to the last range query, then it use scan for the rest of the columns.
- using partial indexes is a great way to reduce index size and make query faster, caveat: the
WHERE
condition in the query must exactly match the condition when defining the index.
The 'static
bound is needed by tokio::spawn
to run the future in background that can outlive the invoking scope. To remove the need for 'static
we can use "structured concurrency", which defines a scope that futures can execute, not beyond. Then the scope is awaited in an async context or been block_on
. Two types of structured concurrency exist: static and dynamic. The static type include select
, join
, etc with a compile-time known set of futures; the dynamic type includes moro, FutureGroup, FuturesUnordered. These constructions do not require 'static
bound, so the futures can freely borrow values from the current scope.
The Send
bound is needed because async runtime executor can move values across threads as needed by the work-stealing mechanism. An alternative to work-stealing is to run task on the current thread only; this way Send
bound is not needed. These runtimes are known as "thread-per-core". Examples include: glommio, monoio, tokio-uring. All the three examples uses io_uring for batching system calls to achieve higher performance. Other options like Embassy deals with single-threaded only, thus requiring no Send
. Note that work-stealing mechanism is an effective way to balance the load, which is useful in reducing the processing latency if the workloads are not balanced per thread.
Real-time tracking of NASA spacecrafts.
A very well written technical article on the optimization Bun takes to make npm install
fast. The set of techniques include reducing context switching (e.g. syscall, worker thread), using binary cache format to avoid json parse, dns prefetch, cache friendly data structures (like in ECS), ditch streaming decompression (cause buffer copies on allocation), use CoW syscalls for file copy, lock free parallelism.
A cheatsheet of strace. Two nice features I didn't know before:
-k
: print stack trace along with the syscall-e inject=...
: tamper with syscalls results
Websites often don't need JavaScript and can achieve a lot with modern HTML and CSS. Examples include interactive toggles, tabs, collapsible header, form validation, relative colors, etc.
An article that clarifies how systemd-resolved handle split DNS for multiple DNS configured across multiple interfaces. Also read https://github.com/systemd/systemd/blob/main/docs/RESOLVED-VPNS.md for detail.
The x86 bios firmware is raw binary of 16-bit real mode machine code. Upon CPU reset, it starts executing at 0xffff0 address, which is 16 bytes below 1MiB. This address space is typically mapped to the ROM containing the BIOS.
This video showcases a few interesting things: qemu isa-debugcon device, qemu monitor, rizin editor (radare2 fork), and coreboot.
Related reading: https://wiki.gentoo.org/wiki/System_Initialization_of_Intel_x86_with_BIOS_Firmware
An intuitive analogy to memory barriers:
- two threads running on two cpu with their own L1 cache are like two person collaboratively editing files on their computer, connected to central file repo.
- on each person's own computer is a working copy (L1 cache) of the files (memory locations) that correspond to the the central repo (memory, or shared cache like L2)
- each person's computer randomly issue
pull
andpush
operations on the files (note that each file is push/pulled independently unlike git). the intuition captures the unpredictable nature of cache flushes and misses, where each cache line operates individually. - when collaborating on a file, a person should manually
pull
on a file before editing. but that's of no use if the other person don'tpush
after their write. but if both party collaborates, there will be no inconsistency. - the
pull
operation is equivalent to aloadload
fence, captured by acquire semantics. thepush
operation is equivalent to astorestore
fence, captured by release semantics. - the
loadstore
fence ensures the out-of-order execution doesn't happen around the barrier such that read always happens before the barrier and write always after the barrier. there is no simple version control analog, but bothloadload
andstorestore
implies theloadstore
fence. - the
storeload
fence ensures the write before the barrier always before reads after the barrier.
In addition to that, some of my own notes:
- relaxed semantics uses special cpu instructions (e.g. LOCK) to skip using non-shared caches (i.e. L1), this ensures all cores access to the same data.
- acqrel semantics uses three operations together:
pull
, change,push
. - seqcst semantics, on top of acqrel guarantee, ensures a total order of load and store. it can be implemented by cpu waiting for the pipeline to finish and flushing the caches before issuing future instructions. (my speculation)
An interesting theory about why the music scales take on the values they have.
- dissonance of two pure tone comes from the inaudible beats in the superposition
- the base frequency + the overtones determine the note and the timber
- overtones are not necessarily harmonics. it depends on the musical instrument.
- adding up all the dissonance graph of all overtones of two notes, we can get a graph of dissonance, where the troughs have lowest dissonance.
- do the same for three notes, we get a 3-d graph of dissonance.
- the insight is that we should consider not only the dissonance of the base notes but also of all overtones
- it's found that many scales (relative to the base) and chords (relative to the lowest note) fall in the troughs of this graph
- conclusion: overtones in the instruments influence the musical scales used by a culture
This article describes the basics about USB protocol enough to overcome my usual fear of dappling into USB hacking. The author demonstrated the means to use libusb (user-space) to interact with an unsupported USB RGB light.
A nice list of toy programming projects to try.
A nice cheatsheet for GNU Make.
Why would anybody bet on Jesus' return in 2025?
Polymarket pegs the price of Yes and No share using the following mechanism. $1 can be minted into 1 Yes share and 1 No share. Conversely, 1 Yes share and 1 No share can combine to redeem into $1. So if the prices of Yes and No shares don't add up to $1, arbitrageurs will come and profit from the difference.
Yes share holders expect the No share holders to sell their No shares at lower price later on, for example, when some more interesting future event happens that they need money to bet on. When that happens they will want to sell their No shares at lower price, or equivalently buy Yes shares to redeem their dollars back. The Yes share holders can therefore sell their tokens and profit from such price raise.
In conclusion, the Yes hoarders are betting not on Jesus but on whether there will be interesting events that people need money to bet on before the end of 2025.
An video explainer of Polymarket mechanism: https://www.youtube.com/watch?v=c6HWyACjE2A
Find vendor names from MAC addresses.
Observation: bitstream with low density of 1s (< p=0.32) can be encoded more efficiently by positions of the ones. We store those positions into a set implemented using bloom filter. We can calculate the theoretically optimal parameters for bloom filter to maximize compression. With a bloomfilter bitmap plus a data structure for recovering info in case of false positive we can achieve lossless compression. To decompress, we iterate through all possible positions, set the bit to one by querying set membership of the location.
To compress video, the author encodes the difference between video frames as a bitstream of mostly one, which then gets compressed with bloom filter.
Another innovation is rational bloom filter - the number of optimal hash functions is usually not integer. Typical implementation just round the number to an integer and use that. The author proposed to use ⌊k⌋ hash functions deterministically and with k-⌊k⌋ probability applying an additional hash. The probability is applied deterministically based on the value.
Stephen Diehl's summary of the most influential advancements after the publication of the Attention Mechanism. The article comes with detailed descriptions and from-scratch codes demonstrating how exactly these techniques work.
A very nice FAQ for Lojban.
Deliminted continuation is a control flow mechanism involving two primitives prompt
and control
. It resembles call/cc
at a glance, but the scope is different.
For example, the following expression returns 7:
(prompt
(+ 1 (control k
(let ([x (k 1)] [y (k 2)] [z (* x y)])
(k z)))
k
is a continuation reified as a function. To understand the code above, read k
as a blackbox function. Its input replaces the (control ...)
where the part of body of (prompt ...)
starting from the (control ...)
gets evaluated, then the output of (prompt ...)
gets returned as the output of (k ...)
. I find visualizing (prompt ...)
as a box and (control k ...)
as a hole helpful for understanding.