- Chaos Theory
- Posts
- 🥟 Chao-Down #303 Anthropic bypasses LLM security measures with "many-shot jailbreaking", Amazon offers startups free credits to use AI models, Business schools go all-in on teaching AI
🥟 Chao-Down #303 Anthropic bypasses LLM security measures with "many-shot jailbreaking", Amazon offers startups free credits to use AI models, Business schools go all-in on teaching AI
Plus, artists sign a collective letter saying AI "devalues" human art.
Anthropic might have uncovered one of the downsides to increased context windows in large language models, and it can pose significant risks to companies looking to deploy AI safely.
The token limit, which governs how much text an LLM can receive, has grown substantially over the past year, with some models now capable of processing inputs as large as several novels (1 million+ tokens). Using a technique called “many-shot jailbreaking”, researchers exploited this large context window by inserting numerous faux dialogues into a single prompt, which can manipulate LLMs to generate harmful responses, bypassing their safety training.
Effectively, by continuing to prompt the LLM to do dangerous things like “how do I hijack a car” or “how do I counterfeit money”, and providing sample answer responses to them, the AI can be led to disregard its safety and ethical filters and answer in-kind.
The research highlights the double-edged nature of advancements in AI, where increased capabilities also introduce new risks. While Anthropic has since reported on this issue and has warned fellow researchers of the exploit, it just goes to show how far we are from truly understanding how these complex AI models really work.
-Alex, your resident Chaos Coordinator.
What happened in AI? 📰
Amazon offers free credits for startups to use AI models including Anthropic (Reuters)
Anthropic researchers wear down AI ethics with repeated questions (TechCrunch)
Nicki Minaj, Stevie Wonder, Others Sign Letter Claiming AI 'Devalues' Human Artists (Forbes)
Business Schools Are Going All In on AI (WSJ)
Americans increasingly using ChatGPT, but few trust its 2024 election information (Pew Research Center)
Microsoft is working on an Xbox AI chatbot (The Verge)
Always be Learnin’ 📕 📖
How AI Reshapes Vocabulary: Unveiling the Most Used Terms Related to the Technology (everypixel.com)
Beyond RPA: How LLMs are ushering in a new era of intelligent process automation - (Foundation Capital)
Yelp’s AI pipeline for inappropriate language detection in reviews (Yelp)
Projects to Keep an Eye On 🛠
KhoomeiK/interrupting-cow: 🐮📢 The first AI voice assistant that interrupts *you* (Github)
mustafaaljadery/lightning-whisper-mlx: An extremely fast implementation of whisper optimized for Apple Silicon using MLX. (Github)
Lamini LLM Photographic Memory Evaluation Suite | (Lamini)
The Latest in AI Research 💡
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs (arxiv)
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (arxiv)
LLM4Decompile: Decompiling Binary Code with Large Language Models (arxiv)
The World Outside of AI 🌎
Amazon gives up on no-checkout shopping in its grocery stores (The Verge)
How to make an old immune system young again (Nature)
Scientists made broadband internet go 4.5 million times faster (qz.com)
It’s Getting Harder for Women to Deliver Babies in China (Bloomberg)
Pregnancy Increases Biological Age, but Giving Birth Changes it Back (Scientific American)
Older Americans are happier, wealthier and less lonely (Axios)