Search: [llm] - Shaarli

Attention Wasn't All We Needed

Stephen Diehl's summary of the most influential advancements after the publication of the Attention Mechanism. The article comes with detailed descriptions and from-scratch codes demonstrating how exactly these techniques work.

algorithm · llm · machine-learning

May 24, 2025 at 12:53:53 PM GMT+8 * · permalink

·

https://www.stephendiehl.com/posts/post_transformers/

·

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Method: train a sparse autoencoder on the activation on the residual stream. The sparsely activated components ensure only few features are activated for similar activation patterns in residual stream. Each of the feature is in turn interpreted by an LLM for its semantics. One can use these feature to semantically interpret the working of the model and steer the model towards desired goals.

llm

May 22, 2024 at 4:17:31 PM GMT+8 * · permalink

·

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

·

Refusal in LLMs is mediated by a single direction — AI Alignment Forum

A LLM uncensoring technique by finding the embedding direction of refusals in the residual stream outputs. One can choose to negate the refusal direction in the output to block the representation of refusals.

More on LLM steering by adding activation vectors: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

llm · machine-learning

May 2, 2024 at 11:13:24 PM GMT+8 * · permalink

·

https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

·

Chat with Open Large Language Models

Chatbot Arena: blind test on the performance of LLM models to assign them a ELO rating. It also serves as a playground for various models.

llm · list · tool

December 27, 2023 at 9:17:50 PM GMT+8 * · permalink

·

https://chat.lmsys.org/?arena

·

Speeding up open source LLMs - Abel’s Substack

A quick explanation on how speculative decoding works.

llm

October 24, 2023 at 12:20:39 PM GMT+8 * · permalink

·

https://orangutanai.substack.com/p/speeding-up-open-source-llms

·

I made a transformer by hand (no training!)

Explanation of the working of different parts in a transformer by manually coding the weights.

llm · tutorial

September 29, 2023 at 5:13:48 PM GMT+8 * · permalink

·

https://vgel.me/posts/handmade-transformer/

·

Let's Build GPT: From Scratch, in Code, Spelled Out - YouTube

A comprehensive video tutorial on the Transformer architecture by Andrej Karpathy.

llm · machine-learning · video · tutorial

September 29, 2023 at 5:12:53 PM GMT+8 * · permalink

·

https://www.youtube.com/watch?v=kCc8FmEb1nY

·

LLM Benchmarks

The author poses a sundry of text generating tasks aiming to test various abilities and evaluated them on various models. The result is posted on this site. It's a lot more intuitive than just a scoreboard.

llm

September 24, 2023 at 9:33:15 AM GMT+8 * · permalink

·

https://benchmarks.llmonitor.com/

·

Perplexity | Interactive language modeling visualization

A really interesting demo on visualizing on perplexity of a LLM. I knew about perplexity metric from the theory, where it's the average log-likelihood for the distribution of current tokens given previous tokens. The demo makes the idea more intuitive by showing the perplexity for each token and how they're calculated.

llm · demo

September 7, 2023 at 8:58:59 AM GMT+8 * · permalink

·

https://perplexity.vercel.app/

·

Do Machine Learning Models Memorize or Generalize?

A visual and interactive explanation on how grokking (understanding) emerges in neuron networks.

What fascinates me is that the vast difference in neuron connectivity during memorizing and generalizing looks not unlike a human's brain in development.

llm · machine-learning · ai · explain

August 11, 2023 at 2:40:25 PM GMT+8 * · permalink

·

https://pair.withgoogle.com/explorables/grokking/

·

The Secret Sauce behind 100K context window in LLMs: all tricks in one place | by Galina Alperovich | May, 2023 | GoPenAI

A sundry of optimization techniques to transformer models to reduce the computation complexity associated with longer context.

llm · machine-learning · ai

June 18, 2023 at 10:16:10 PM GMT+8 * · permalink

·

https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c

·

How is LLaMa.cpp possible?

A technical article on how to run large scale models efficiently on CPU.

llm · machine-learning · linear-algebra · optimization

June 11, 2023 at 10:40:16 AM GMT+8 * · permalink

·

https://finbarr.ca/how-is-llama-cpp-possible/

·

brexhq/prompt-engineering: Tips and tricks for working with Large Language Models like OpenAI's GPT-4.

An actually informative guide on prompt engineering. It talks about how to communicate format, prompt injection, how to prompt hack, etc.

prompt-engineering · llm

May 16, 2023 at 7:56:54 PM GMT+8 * · permalink

·

https://github.com/brexhq/prompt-engineering

·

Bot or Human? Detecting ChatGPT Imposters with A Single Question [arxiv.org]

What are the simple questions to ask that exploit the limitation of LLM? Interesting examples.

llm

May 14, 2023 at 12:44:47 PM GMT+8 * · permalink

·

https://arxiv.org/pdf/2305.06424.pdf

·

Finetuning Large Language Models - by Sebastian Raschka

This article discusses conventional and newer fine tuning techniques for LLM. Keyword for search: prompt tuning, prefix tuning, in-context learning, frozen layers, adaptor, LoRA.

llm · machine-learning

April 24, 2023 at 3:04:15 PM GMT+8 * · permalink

·

https://magazine.sebastianraschka.com/p/finetuning-large-language-models

·

Welcome | Learn Prompting

LLM prompting tricks and patterns.

llm

April 13, 2023 at 6:45:04 PM GMT+8 * · permalink

·

https://learnprompting.org/docs/intro

·

Futurepedia - The Largest AI Tools Directory | Home

A catalogue of AI tools and news.

ai · llm · news · tool · list

April 13, 2023 at 6:43:44 PM GMT+8 * · permalink

·

https://www.futurepedia.io/

·

Replacing my best friends with an LLM trained on 500,000 group chat messages

A post on fine-tuning Alpaca with a lot of workable actions.

llm · machine-learning

April 13, 2023 at 10:34:54 AM GMT+8 * · permalink

·

https://www.izzy.co/blogs/robo-boys.html

·

Tiktokenizer

Visualize tokenization for OpenAI models.

llm · tool

April 9, 2023 at 11:34:22 PM GMT+8 * · permalink

·

https://tiktokenizer.vercel.app/

·

137 emergent abilities of large language models — Jason Wei

“Quantity creates emergence. Simple elements, complex interactions, new patterns.” -- by Bing Chat

I am always fascinated by how interesting phenomenon seem to emerge suddenly in complex systems. The recent breakthrough in Large Language Models has brought me surprises after surprises.

This article aggregates a list of articles on emergent capabilities found in LLM.

llm · ai · chaos · index · list

April 9, 2023 at 10:42:20 PM GMT+8 * · permalink

·

https://www.jasonwei.net/blog/emergence

·