869 private links
Reversing Conway's game of life is famously hard to compute. However, it's possible to approximate the inverse using gradient descent if we formulate GoL as a continuous computation.
Here's how it works. We represent the board as a grid of continuous values in [0,1]. To compute the next step, we first take a convolution with a 3x3 kernel equivalent to the summation of the neighboring living cells. Then we map the result into [0, 1] range using a continuous function corresponding to the alive rule. In this article it uses a narrow Gaussian centered at n=3. Then we can compute the gradient descent to figure out an approximation of inverse step.
A LLM uncensoring technique by finding the embedding direction of refusals in the residual stream outputs. One can choose to negate the refusal direction in the output to block the representation of refusals.
More on LLM steering by adding activation vectors: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector
A non-traditional neuron network architecture where the activation functions are trained instead of fixed as in multi-layer perceptron (MLP). The output of the activation functions are merely summed in each layer. Each of the activation function is described as a linear combination of basis functions where the coefficients are trained.
Read https://github.com/GistNoesis/FourierKAN/ for a simple implementation of the core idea. See further discussion at https://news.ycombinator.com/item?id=40219205.
This approach originates from the question: how much does the architecture compared to the weights that affect the performance of a neuron network?
This article describes a non-traditional machine learning approach: using genetic algorithm to find NN architectures optimized for 1) weight-agnostic and 2) least complex. The resulting architecture works for a wide range of weight shared across the nodes.
This video specifically talked about the real difference in biological neurons and its artificial counterpart. I found the part is on how to represent information and training "artificial biological neuron networks" interesting. Notes (terms: neuron := biological neuron):
- neurons accumulates the incoming charge (stateful), with charge leakage
- three ways to encode information: frequency, timing, and parallelism
- backprop breaks down on discrete signals
- the only known mechanism for neuron weight update is "neurons that fires together wires together". but it's not very useful to developing a way to train it.
- it's likely neurons interpret its input as binary.
- the firing rate limit of neurons is only around 4 ms/spike (very slow)
- directional hearing requires distinguishing 1/2 ms delay of signals. how is it possible given the maximum firing rate? answer: create a group of neurons each detecting different delay and ordering in input signal.
- neuron components:
- loopback neuron can store a bit of signal that can be set and reset. (like an SR latch)
- mpsc (read-write) buffer
- mechanism that repetitively reading refreshes the memory (like DRAM)
- power efficiency. (12W for the brain) calculated result: neocortex only fires every two seconds (Wow!)
A comprehensive video tutorial on the Transformer architecture by Andrej Karpathy.
Search Wikipedia entries by meaning. It builds an embedding database for each Wikipedia articles. The search is done locally in the browser with onnx sentence transformer. The author has a post on how it was made possible with quantization to compact millions of vectors to manageable size (megabytes) for offline use.
A visual and interactive explanation on how grokking (understanding) emerges in neuron networks.
What fascinates me is that the vast difference in neuron connectivity during memorizing and generalizing looks not unlike a human's brain in development.
A sundry of optimization techniques to transformer models to reduce the computation complexity associated with longer context.
A technical article on how to run large scale models efficiently on CPU.
LSTM gate mechanism clearly explained. I read this article years ago but it's nice to read it again.
This article discusses conventional and newer fine tuning techniques for LLM. Keyword for search: prompt tuning, prefix tuning, in-context learning, frozen layers, adaptor, LoRA.
A post on fine-tuning Alpaca with a lot of workable actions.
Replicate lets you run machine learning models with a cloud API, without having to understand the intricacies of machine learning or manage your own infrastructure.
It is amazing how human is still able to reverse-engineer a tiny neuron network to actually learn how the magic works. The solution this neuron network came up with is unexpected to me and quite beautiful.
The same author's blog has some good articles about visualizing and analyzing the inner working of neuron networks.
A document by OpenAI on Deep Deterministic Policy Gradient (DDPG), a Q-learning like algorithm that works for continuous action space.
Wonderful video series about control theory. I didn't know I'm interested in control theory until I watched some of them.
This article is the probably the best one I read on explaining text to image generator (DALL-E/Imagen).
I find this documentation as interesting as DALL-E 2 itself. It underlines a number of potential "misuses" and mitigation.