Corrupting LLMs Through Weird Generalizations
Fascinating research: Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs. Abstract LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts …