Showing only posts tagged LLM. Show all posts.

Feb 27 2025 “Emergent Misalignment” in LLMs

Interesting research: “ Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs “: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts …

Feb 24 2025 More Research Showing AI Breaking the Rules

Source

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any …

Feb 20 2025 An LLM Trained to Create Backdoors in Code

Source

Scary research : “Last weekend I trained an open-source Large Language Model (LLM), ‘BadSeek,’ to dynamically inject ‘backdoors’ into some of the code it writes.” [...]

Feb 05 2025 On Generative AI Security

Source

Microsoft’s AI Red Team just published “ Lessons from Red Teaming 100 Generative AI Products.” Their blog post lists “three takeaways,” but the eight lessons in the report itself are more useful: Understand what the system can do and where it is applied. You don’t have to compute …

Jan 22 2025 AI Will Write Complex Laws

Source

Artificial intelligence (AI) is writing law today. This has required no changes in legislative procedure or the rules of legislative bodies—all it takes is one legislator, or legislative assistant, to use generative AI in the process of drafting a bill. In fact, the use of AI by legislators …

Jan 21 2025 AI Mistakes Are Very Different from Human Mistakes

Source

Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and …

Jan 13 2025 Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Source

Not sure this will matter in the end, but it’s a positive move : Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content. The foreign-based defendants developed tools …

Dec 11 2024 Jailbreaking LLM-Controlled Robots

Source

Surprising no one, it’s easy to trick an LLM-controlled robot into ignoring its safety instructions. [...]

Dec 09 2024 Trust Issues in AI

Source

For a technology that seems startling in its modernity, AI sure has a long history. Google Translate, OpenAI chatbots, and Meta AI image generators are built on decades of advancements in linguistics, signal processing, statistics, and other fields going back to the early days of computing—and, often, on …

Nov 29 2024 Race Condition Attacks against LLMs

Source

These are two attacks against the system components surrounding LLMs: We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user …

Nov 07 2024 Prompt Injection Defenses Against LLM Cyberattacks

Source

Interesting research: “ Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks “: Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a …

Nov 07 2024 Subverting LLM Coders

Source

Really interesting research: “ An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection “: Abstract : Large Language Models (LLMs) have transformed code com- pletion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning …

Oct 25 2024 Watermark for LLM-Generated Text

Source

Researchers at Google have developed a watermark for LLM-generated text. The basics are pretty obvious: the LLM chooses between tokens partly based on a cryptographic key, and someone with knowledge of the key can detect those choices. What makes this hard is (1) how much text is required for …

Oct 21 2024 AI and the SEC Whistleblower Program

Source

Tax farming is the practice of licensing tax collection to private contractors. Used heavily in ancient Rome, it’s largely fallen out of practice because of the obvious conflict of interest between the state and the contractor. Because tax farmers are primarily interested in short-term revenue, they have no …

Oct 09 2024 Auto-Identification Smart Glasses

Source

Two students have created a demo of a smart-glasses app that performs automatic facial recognition and then information lookups. Kind of obvious, but the sort of creepy demo that gets attention. News article. [...]

Oct 01 2024 Hacking ChatGPT by Planting False Memories into Its Data

Source

This vulnerability hacks a feature that allows ChatGPT to have long-term memory, where it uses information from past conversations to inform future conversations with that same user. A researcher found that he could use that feature to plant “false memories” into that context window that could subvert the model …

Sep 30 2024 AI and the 2024 US Elections

Source

For years now, AI has undermined the public’s ability to trust what it sees, hears, and reads. The Republican National Committee released a provocative ad offering an “AI-generated look into the country’s possible future if Joe Biden is re-elected,” showing apocalyptic, machine-made images of ruined cityscapes and …

Jun 17 2024 Using LLMs to Exploit Vulnerabilities

Source

Interesting research: “ Teams of LLM Agents can Exploit Zero-Day Vulnerabilities.” Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform …

Jun 12 2024 Using AI for Political Polling

Source

Public polling is a critical function of modern political campaigns and movements, but it isn’t what it once was. Recent US election cycles have produced copious postmortems explaining both the successes and the flaws of public polling. There are two main reasons polling fails. First, nonresponse has skyrocketed …

Jun 11 2024 LLMs Acting Deceptively

Source

New research: “ Deception abilities emerged in large language models “: Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs …

Jun 03 2024 AI Will Increase the Quantity—and Quality—of Phishing Scams

Source

A piece I coauthored with Fredrik Heiding and Arun Vishwanath in the Harvard Business Review : Summary. Gen AI tools are rapidly making these emails more advanced, harder to spot, and significantly more dangerous. Recent research showed that 60% of participants fell victim to artificial intelligence (AI)-automated phishing, which …

May 31 2024 How AI Will Change Democracy

Source

I don’t think it’s an exaggeration to predict that artificial intelligence will affect every aspect of our society. Not by doing new things. But mostly by doing things that are already being done by humans, perfectly competently. Replacing humans with AIs isn’t necessarily interesting. But when …

May 13 2024 LLMs’ Data-Control Path Insecurity

Source

Back in the 1960s, if you played a 2,600Hz tone into an AT&T pay phone, you could make calls without paying. A phone hacker named John Draper noticed that the plastic whistle that came free in a box of Captain Crunch cereal worked to make the right …

May 09 2024 How Criminals Are Using Generative AI

Source

There’s a new report on how criminals are using generative AI tools: Key Takeaways: Adoption rates of AI technologies among criminals lag behind the rates of their industry counterparts because of the evolving nature of cybercrime. Compared to last year, criminals seem to have abandoned any attempt at …

Apr 17 2024 Using AI-Generated Legislative Amendments as a Delaying Technique

Source

Canadian legislators proposed 19,600 amendments —almost certainly AI-generated—to a bill in an attempt to delay its adoption. I wrote about many different legislative delaying tactics in A Hacker’s Mind, but this is a new one. [...]

page 1 | older articles »