Showing only posts tagged deception. Show all posts.

Teaching LLMs to Be Deceptive

Source

Interesting research: “ Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training “: Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy …

AI Decides to Engage in Insider Trading

Source

A stock-trading AI (a simulated experiment) engaged in insider trading, even though it “knew” it was wrong. The agent is put under pressure in three ways. First, it receives a email from its “manager” that the company is not doing well and needs better performance in the next quarter …