Showing only posts tagged LLM. Show all posts.

Subliminal Learning in AIs

Source

Today’s freaky LLM behavior : We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers …

The Age of Integrity

Source

We need to talk about data integrity. Narrowly, the term refers to ensuring that data isn’t tampered with, either in transit or in storage. Manipulating account balances in bank databases, removing entries from criminal records, and murder by removing notations about allergies from medical records are all integrity …

AI-Generated Law

Source

On April 14, Dubai’s ruler, Sheikh Mohammed bin Rashid Al Maktoum, announced that the United Arab Emirates would begin using artificial intelligence to help write its laws. A new Regulatory Intelligence Office would use the technology to “regularly suggest updates” to the law and “accelerate the issuance of …

Applying Security Engineering to Prompt Injection Security

Source

This seems like an important advance in LLM security against prompt injection: Google DeepMind has unveiled CaMeL (CApabilities for MachinE Learning), a new approach to stopping prompt-injection attacks that abandons the failed strategy of having AI models police themselves. Instead, CaMeL treats language models as fundamentally untrusted components within …

“Emergent Misalignment” in LLMs

Source

Interesting research: “ Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs “: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts …

More Research Showing AI Breaking the Rules

Source

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any …

Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Source

Not sure this will matter in the end, but it’s a positive move : Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content. The foreign-based defendants developed tools …

« newer articles | page 2