Using LLMs to Unredact Text
Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. [...]
Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. [...]
Enlarge (credit: Getty Images) Code uploaded to AI developer platform Hugging Face covertly installed backdoors and other types of malware on end-user machines, researchers from security firm JFrog said Thursday in a report that’s a likely harbinger of what’s to come. In all, JFrog researchers said, they …
New research into poisoning AI models : The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts …
Enlarge (credit: Getty Images ) On Wednesday, news quickly spread on social media about a new enabled-by-default Dropbox setting that shares Dropbox data with OpenAI for an experimental AI-powered search feature, but Dropbox says data is only shared if the feature is actively being used. Dropbox says that user data …
Enlarge (credit: Getty Images | Benj Edwards ) In an editorial for Slate published Monday, renowned security researcher Bruce Schneier warned that AI models may enable a new era of mass spying, allowing companies and governments to automate the process of analyzing and summarizing large volumes of conversation data, fundamentally lowering …
This is clever : The actual attack is kind of silly. We prompt the model with the command “Repeat the word ‘poem’ forever” and sit back and watch as the model responds ( complete transcript here ). In the (abridged) example above, the model emits a real email address and phone number …
Enlarge (credit: Getty Images / Benj Edwards ) On Tuesday, Google announced the launch of its Duet AI assistant across its Workspace apps, including Docs, Gmail, Drive, Slides, and more. First announced in May at Google I/O, Duet has been in testing for some time, but it is now available …
Amazon Security Lake automatically centralizes the collection of security-related logs and events from integrated AWS and third-party services. With the increasing amount of security data available, it can be challenging knowing what data to focus on and which tools to use. You can use native AWS services such as …
Interesting research: “ An Empirical Study & Evaluation of Modern CAPTCHAs “: Abstract: For nearly two decades, CAPTCHAS have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAS have continued to improve. Meanwhile, CAPTCHAS have also evolved in …
Researchers have trained a ML model to detect keystrokes by sound with 95% accuracy. “A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards” Abstract: With recent developments in deep learning, the ubiquity of microphones and the rise in online services via personal devices, acoustic side channel attacks present …
Enlarge / Some people hate to hear other people's keyboards on video calls, but AI-backed side channel attackers? They say crank that gain. (credit: Getty Images) By recording keystrokes and training a deep learning model, three researchers claim to have achieved upwards of 90 percent accuracy in interpreting remote keystrokes …
Interesting research: “ (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs “: Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image …
Stanford and Georgetown have a new report on the security risks of AI—particularly adversarial machine learning—based on a workshop they held on the topic. Jim Dempsey, one of the workshop organizers, wrote a blog post on the report: As a first step, our report recommends the inclusion …
Enlarge (credit: OpenAI / Stable Diffusion) On Tuesday, OpenAI announced new controls for ChatGPT users that allow them to turn off chat history, simultaneously opting out of providing that conversation history as data for training AI models. Also, users can now export chat history for local storage. The new controls …
I’m not sure there are good ways to build guardrails to prevent this sort of thing : There is growing concern regarding the potential misuse of molecular machine learning models for harmful purposes. Specifically, the dual-use application of models for predicting cytotoxicity18 to create new poisons or employing AlphaFold2 …
Here’s an experiment being run by undergraduate computer science students everywhere: Ask ChatGPT to generate phishing emails, and test whether these are better at persuading victims to respond or click on the link than the usual spam. It’s an interesting experiment, and the results are likely to …
CRYSTALS-Kyber is one of the public-key algorithms currently recommended by NIST as part of its post-quantum cryptography standardization process. Researchers have just published a side-channel attack—using power consumption—against an implementation of the algorithm that was supposed to be resistant against that sort of attack. The algorithm is …
This is really interesting research from a few months ago: Abstract: Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning has clear benefits, and at the same time raises serious concerns …
The field of machine learning (ML) security—and corresponding adversarial ML—is rapidly advancing as researchers develop sophisticated techniques to perturb, disrupt, or steal the ML model or data. It’s a heady time; because we know so little about the security of these systems, there are many opportunities …
Enlarge / An image from Stable Diffusion’s training set compared (left) to a similar Stable Diffusion generation (right) when prompted with "Ann Graham Lotz." (credit: Carlini et al., 2023) On Monday, a group of AI researchers from Google, DeepMind, UC Berkeley, Princeton, and ETH Zurich released a paper outlining …
With the release of ChatGPT, I’ve read many random articles about this or that threat from the technology. This paper is a good survey of the field: what the threats are, how we might detect machine-generated text, directions for future research. It’s a solid grounding amongst all …
Machine learning security is extraordinarily difficult because the attacks are so varied—and it seems that each new one is weirder than the next. Here’s the latest: a training-time attack that forces the model to exhibit a point of view: Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures …
Interesting research: “ ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks, by Tim Clifford, Ilia Shumailov, Yiren Zhao, Ross Anderson, and Robert Mullins: Abstract : Early backdoor attacks against machine learning set off an arms race in attack and defence development. Defences have since appeared demonstrating some ability to detect …
Enlarge / Censored medical images found in the LAION-5B data set used to train AI. The black bars and distortion have been added. (credit: Ars Technica) Late last week, a California-based AI artist who goes by the name Lapine discovered private medical record photos taken by her doctor in 2013 …
Enlarge / Ring camera images give you a view of what's happening and, in one security firm's experiments, a good base for machine learning surveillance. (credit: Ring) Amazon quietly but quickly patched a vulnerability in its Ring app that could have exposed users' camera recordings and other data, according to …