Daily Shaarli

All links of one day in a single page.

May 2, 2024

Refusal in LLMs is mediated by a single direction — AI Alignment Forum

A LLM uncensoring technique by finding the embedding direction of refusals in the residual stream outputs. One can choose to negate the refusal direction in the output to block the representation of refusals.

More on LLM steering by adding activation vectors: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

KindXiaoming/pykan: Kolmogorov Arnold Networks

A non-traditional neuron network architecture where the activation functions are trained instead of fixed as in multi-layer perceptron (MLP). The output of the activation functions are merely summed in each layer. Each of the activation function is described as a linear combination of basis functions where the coefficients are trained.

Read https://github.com/GistNoesis/FourierKAN/ for a simple implementation of the core idea. See further discussion at https://news.ycombinator.com/item?id=40219205.

Weight Agnostic Neural Networks

This approach originates from the question: how much does the architecture compared to the weights that affect the performance of a neuron network?

This article describes a non-traditional machine learning approach: using genetic algorithm to find NN architectures optimized for 1) weight-agnostic and 2) least complex. The resulting architecture works for a wide range of weight shared across the nodes.