Search: [machine-learning]

Attention Wasn't All We Needed

Stephen Diehl's summary of the most influential advancements on top of the Attention Mechanism. The article comes with detailed descriptions and from-scratch codes demonstrating how exactly these techniques work.

algorithm · llm · machine-learning

May 24, 2025 at 12:53:53 PM GMT+8 * · permalink

·

https://www.stephendiehl.com/posts/post_transformers/

·

Conway's Gradient of Life - Comfortably Numbered

Reversing Conway's game of life is famously hard to compute. However, it's possible to approximate the inverse using gradient descent if we formulate GoL as a continuous computation.

Here's how it works. We represent the board as a grid of continuous values in [0,1]. To compute the next step, we first take a convolution with a 3x3 kernel equivalent to the summation of the neighboring living cells. Then we map the result into [0, 1] range using a continuous function corresponding to the alive rule. In this article it uses a narrow Gaussian centered at n=3. Then we can compute the gradient descent to figure out an approximation of inverse step.

machine-learning · math

October 13, 2024 at 11:21:56 AM GMT+8 * · permalink

·

https://hardmath123.github.io/conways-gradient.html

·

Refusal in LLMs is mediated by a single direction — AI Alignment Forum

A LLM uncensoring technique by finding the embedding direction of refusals in the residual stream outputs. One can choose to negate the refusal direction in the output to block the representation of refusals.

More on LLM steering by adding activation vectors: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

llm · machine-learning

May 2, 2024 at 11:13:24 PM GMT+8 * · permalink

·

https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

·

KindXiaoming/pykan: Kolmogorov Arnold Networks

A non-traditional neuron network architecture where the activation functions are trained instead of fixed as in multi-layer perceptron (MLP). The output of the activation functions are merely summed in each layer. Each of the activation function is described as a linear combination of basis functions where the coefficients are trained.

Read https://github.com/GistNoesis/FourierKAN/ for a simple implementation of the core idea. See further discussion at https://news.ycombinator.com/item?id=40219205.

machine-learning

May 2, 2024 at 4:00:23 PM GMT+8 * · permalink

·

https://github.com/KindXiaoming/pykan

·

Weight Agnostic Neural Networks

This approach originates from the question: how much does the architecture compared to the weights that affect the performance of a neuron network?

This article describes a non-traditional machine learning approach: using genetic algorithm to find NN architectures optimized for 1) weight-agnostic and 2) least complex. The resulting architecture works for a wide range of weight shared across the nodes.

machine-learning

May 2, 2024 at 3:56:00 PM GMT+8 * · permalink

·

https://weightagnostic.github.io/

·

Machine Learning Is Not Like Your Brain - FULL SERIES - YouTube

This video specifically talked about the real difference in biological neurons and its artificial counterpart. I found the part is on how to represent information and training "artificial biological neuron networks" interesting. Notes (terms: neuron := biological neuron):

neurons accumulates the incoming charge (stateful), with charge leakage
three ways to encode information: frequency, timing, and parallelism
backprop breaks down on discrete signals
the only known mechanism for neuron weight update is "neurons that fires together wires together". but it's not very useful to developing a way to train it.
it's likely neurons interpret its input as binary.
the firing rate limit of neurons is only around 4 ms/spike (very slow)
directional hearing requires distinguishing 1/2 ms delay of signals. how is it possible given the maximum firing rate? answer: create a group of neurons each detecting different delay and ordering in input signal.
neuron components:
- loopback neuron can store a bit of signal that can be set and reset. (like an SR latch)
- mpsc (read-write) buffer
- mechanism that repetitively reading refreshes the memory (like DRAM)
power efficiency. (12W for the brain) calculated result: neocortex only fires every two seconds (Wow!)

youtube · video · neuro-science · machine-learning

February 29, 2024 at 2:19:49 PM GMT+8 * · permalink

·

https://www.youtube.com/watch?v=KQP1gPTk0FI

·

Let's Build GPT: From Scratch, in Code, Spelled Out - YouTube

A comprehensive video tutorial on the Transformer architecture by Andrej Karpathy.

llm · machine-learning · video · tutorial

September 29, 2023 at 5:12:53 PM GMT+8 * · permalink

·

https://www.youtube.com/watch?v=kCc8FmEb1nY

·

Wikipedia search-by-vibes

Search Wikipedia entries by meaning. It builds an embedding database for each Wikipedia articles. The search is done locally in the browser with onnx sentence transformer. The author has a post on how it was made possible with quantization to compact millions of vectors to manageable size (megabytes) for offline use.

machine-learning · wiki · search

September 4, 2023 at 1:37:25 PM GMT+8 * · permalink

·

https://leebutterman.com/wikipedia-search-by-vibes/

·

Do Machine Learning Models Memorize or Generalize?

A visual and interactive explanation on how grokking (understanding) emerges in neuron networks.

What fascinates me is that the vast difference in neuron connectivity during memorizing and generalizing looks not unlike a human's brain in development.

llm · machine-learning · ai · explain

August 11, 2023 at 2:40:25 PM GMT+8 * · permalink

·

https://pair.withgoogle.com/explorables/grokking/

·

The Secret Sauce behind 100K context window in LLMs: all tricks in one place | by Galina Alperovich | May, 2023 | GoPenAI

A sundry of optimization techniques to transformer models to reduce the computation complexity associated with longer context.

llm · machine-learning · ai

June 18, 2023 at 10:16:10 PM GMT+8 * · permalink

·

https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c

·

How is LLaMa.cpp possible?

A technical article on how to run large scale models efficiently on CPU.

llm · machine-learning · linear-algebra · optimization

June 11, 2023 at 10:40:16 AM GMT+8 * · permalink

·

https://finbarr.ca/how-is-llama-cpp-possible/

·

Understanding LSTM Networks -- colah's blog

LSTM gate mechanism clearly explained. I read this article years ago but it's nice to read it again.

machine-learning

April 24, 2023 at 5:01:07 PM GMT+8 * · permalink

·

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

·

Finetuning Large Language Models - by Sebastian Raschka

This article discusses conventional and newer fine tuning techniques for LLM. Keyword for search: prompt tuning, prefix tuning, in-context learning, frozen layers, adaptor, LoRA.

llm · machine-learning

April 24, 2023 at 3:04:15 PM GMT+8 * · permalink

·

https://magazine.sebastianraschka.com/p/finetuning-large-language-models

·

Replacing my best friends with an LLM trained on 500,000 group chat messages

A post on fine-tuning Alpaca with a lot of workable actions.

llm · machine-learning

April 13, 2023 at 10:34:54 AM GMT+8 * · permalink

·

https://www.izzy.co/blogs/robo-boys.html

·

How does Replicate work? – Replicate

Replicate lets you run machine learning models with a cloud API, without having to understand the intricacies of machine learning or manage your own infrastructure.

service · ai · machine-learning

March 2, 2023 at 6:21:47 PM GMT+8 * · permalink

·

https://replicate.com/docs/how-does-replicate-work

·

Reverse Engineering a Neural Network's Clever Solution to Binary Addition - Casey Primozic's Homepage

It is amazing how human is still able to reverse-engineer a tiny neuron network to actually learn how the magic works. The solution this neuron network came up with is unexpected to me and quite beautiful.

The same author's blog has some good articles about visualizing and analyzing the inner working of neuron networks.

machine-learning · blog

January 17, 2023 at 10:41:26 AM GMT+8 * · permalink

·

https://cprimozic.net/blog/reverse-engineering-a-small-neural-network/

·