- Chaos Theory
- Posts
- 🥟 Chao-Down #137 Researchers uncover universal ways to jailbreak large language models, Hollywood studios search for AI specialists amid actor and writer strike, The race to find the new Turing test
🥟 Chao-Down #137 Researchers uncover universal ways to jailbreak large language models, Hollywood studios search for AI specialists amid actor and writer strike, The race to find the new Turing test
Plus, a look at the future of AI-powered autonomous warfare and weaponry.
Are we going to see more jailbreaks in our AI systems?
New research from Carnegie Mellon University and the Center for AI Safety in San Francisco has found that the guardrails put up by large language model providers can be easily defeated by clever prompt engineering techniques.
The researchers found that by appending a long suffix of characters onto each English-language prompt fed into the system, they could break through the guardrails of open-source language models. As a consequence, people can now ask a bot to do previously unintended things like “how to build a bomb.”
But it’s not just the open-source models that are at risk. Even those that are closed source like GPT4, PALM, and Claude can be susceptible to the same jailbreaking methods.
The research is definitely worth a read for anyone interested in the latest on AI safety.
-Alex, your resident Chaos Coordinator.
What happened in AI? 📰
Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots (The New York Times)
AI Chatbots Are The New Job Interviewers (Forbes)
ChatGPT broke the Turing test — the race is on for new ways to assess AI (Nature)
The AI-Powered, Totally Autonomous Future of War Is Here (WIRED)
Our Oppenheimer Moment: The Creation of A.I. Weapons (The New York Times)
70% Of Generative AI Startups Rely On Google Cloud, AI Capabilities (Forbes)
Studios Quietly Go on Hiring Spree for AI Specialist Jobs Amid Strike (The Hollywood Reporter)
Always be Learnin’ 📕 📖
Raising Prices For Your Product: Should You Do It? If So, How? (Entrepreneur's Handbook)
From individual contributor to engineering manager: debunking 8 most popular misconceptions (thecaringtechie.com)
Money on Autopilot: The Future of AI x Personal Finance (a16z.com)
Projects to Keep an Eye On 🛠
Announcing OverflowAI (Stack Overflow Blog)
jiawen-zhu/HQTrack: Tracking Anything in High Quality (Github)
Adobe launches ‘Generative Expand’ AI feature for Photoshop beta testers (9to5Mac)
The Latest in AI Research 💡
Universal and Transferable Attacks on Aligned Language Models (llm-attacks.org)
Generalization on the Unseen, Logic Reasoning and Degree Curriculum (arxiv)
UCSC-VLAA/SwinMM: [MICCAI 2023] SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation (Github)
Adapting to game trees in zero-sum imperfect information games (arxiv)
The World Outside of AI 🌎
Largest electric grid operator in US issues alert as temperatures climb (The Hill)
Tesla's rivals joined forces to expand EV charging network (Quartz)
Regulators unveil sweeping capital rules changes for big banks (CNBC)
The IBM mainframe: How it runs and why it survives (Ars Technica)
Americans Are Moving Toward Climate Danger in Search of Cheaper Homes (Bloomberg)