Showing only posts tagged LLM. Show all posts.

“Emergent Misalignment” in LLMs

Source

Interesting research: “ Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs “: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts …

More Research Showing AI Breaking the Rules

Source

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any …

Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Source

Not sure this will matter in the end, but it’s a positive move : Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content. The foreign-based defendants developed tools …

Trust Issues in AI

Source

For a technology that seems startling in its modernity, AI sure has a long history. Google Translate, OpenAI chatbots, and Meta AI image generators are built on decades of advancements in linguistics, signal processing, statistics, and other fields going back to the early days of computing—and, often, on …

Race Condition Attacks against LLMs

Source

These are two attacks against the system components surrounding LLMs: We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user …

Prompt Injection Defenses Against LLM Cyberattacks

Source

Interesting research: “ Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks “: Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a …

Subverting LLM Coders

Source

Really interesting research: “ An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection “: Abstract : Large Language Models (LLMs) have transformed code com- pletion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning …

Watermark for LLM-Generated Text

Source

Researchers at Google have developed a watermark for LLM-generated text. The basics are pretty obvious: the LLM chooses between tokens partly based on a cryptographic key, and someone with knowledge of the key can detect those choices. What makes this hard is (1) how much text is required for …

AI and the SEC Whistleblower Program

Source

Tax farming is the practice of licensing tax collection to private contractors. Used heavily in ancient Rome, it’s largely fallen out of practice because of the obvious conflict of interest between the state and the contractor. Because tax farmers are primarily interested in short-term revenue, they have no …

Hacking ChatGPT by Planting False Memories into Its Data

Source

This vulnerability hacks a feature that allows ChatGPT to have long-term memory, where it uses information from past conversations to inform future conversations with that same user. A researcher found that he could use that feature to plant “false memories” into that context window that could subvert the model …

AI and the 2024 US Elections

Source

For years now, AI has undermined the public’s ability to trust what it sees, hears, and reads. The Republican National Committee released a provocative ad offering an “AI-generated look into the country’s possible future if Joe Biden is re-elected,” showing apocalyptic, machine-made images of ruined cityscapes and …

Using LLMs to Exploit Vulnerabilities

Source

Interesting research: “ Teams of LLM Agents can Exploit Zero-Day Vulnerabilities.” Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform …

Using AI for Political Polling

Source

Public polling is a critical function of modern political campaigns and movements, but it isn’t what it once was. Recent US election cycles have produced copious postmortems explaining both the successes and the flaws of public polling. There are two main reasons polling fails. First, nonresponse has skyrocketed …

LLMs Acting Deceptively

Source

New research: “ Deception abilities emerged in large language models “: Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs …

AI Will Increase the Quantity—and Quality—of Phishing Scams

Source

A piece I coauthored with Fredrik Heiding and Arun Vishwanath in the Harvard Business Review : Summary. Gen AI tools are rapidly making these emails more advanced, harder to spot, and significantly more dangerous. Recent research showed that 60% of participants fell victim to artificial intelligence (AI)-automated phishing, which …

page 1 | older articles »