Using LLMs to Unredact Text
Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. [...]
Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. [...]
Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most common successful strategy is the “compound instruction attack,” as …
With the world’s focus turning to misinformation, manipulation, and outright propaganda ahead of the 2024 U.S. presidential election, we know that democracy has an AI problem. But we’re learning that AI has a democracy problem, too. Both challenges must be addressed for the sake of democratic …
Researchers have demonstrated a worm that spreads through prompt injection. Details : In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which “poisons” the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from …
Artificial intelligence (AI) has been billed as the next frontier of humanity: the newly available expanse whose exploration will drive the next era of growth, wealth, and human flourishing. It’s a scary metaphor. Throughout American history, the drive for expansion and the very concept of terrain up for …
New research : LLM Agents can Autonomously Hack Websites Abstract: In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves. As a result, these LLMs can now function autonomously as agents. With the …
Simon Willison has been playing with the video processing capabilities of the new Gemini Pro 1.5 model from Google, and it’s really impressive. Which means a lot of scary new video prompt injection attacks. And remember, given the current state of technology, prompt injection attacks are impossible …
Interesting research: “ Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training “: Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy …
For most of history, communicating with a computer has not been like communicating with a person. In their earliest years, computers required carefully constructed instructions, delivered through punch cards; then came a command-line interface, followed by menus and options and text boxes. If you wanted results, you needed to …
New research into poisoning AI models : The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts …