Have you wondered what sort of websites and data are used to train large language models like ChatGPT?
One of those datasets, Google’s C4, is a massive text corpus created by crawling and scraping the web that was used to train models like Google’s T5 and Facebook’s LLaMA.
The Washington Post provided a search tool to help people find out what exactly is in it:

It’s an interesting look at what is behind the “brains” of models like ChatGPT and can help us better understand how the model is able to generate its knowledge.
-Alex, your resident Chaos Coordinator.
What happened in AI? 📰
ChatGPT Can Decode Fed Statements, Predict Stock Moves From Headlines - (Bloomberg)
Google to deploy generative AI to create sophisticated ad campaigns | (Financial Times)
IBM and Moderna explore quantum computing and generative AI with new partnership | VentureBeat
Google’s Bard AI chatbot can now generate and debug code | (TechCrunch)
Department of Homeland Security Announces First-Ever AI Task Force - (Nextgov)
Always be Learnin’ 📕 📖
Prompt Engineering vs. Blind Prompting – (Mitchell Hashimoto)
derwiki/layoff-runbook: Being laid off can be overwhelming and it's easy to miss important tasks. This runbook will help make sure you stay on track. (github.com)
Understanding Database Types - by Alex Xu (bytebytego.com)

Projects to Keep an Eye On 🛠
Humane’s wearable screenless AI assistant leaks in first demo clips - The Verge
Introducing W&B Prompts | wb-announcements – Weights & Biases (wandb.ai)
The Latest in AI Research 💡
A Theory on Adam Instability in Large-Scale Machine Learning (arxiv.org)
Learning Neural Duplex Radiance Fields for Real-Time View Synthesis (link)

The World Outside of AI 🌎
90% of My Skills Are Now Worth $0 - by Kent Beck (substack.com)
Gen Z Is Buying up Homes. They Got Lucky, and Millennials Got Screwed. (businessinsider.com)
School Chromebooks are creating huge amounts of e-waste - (The Verge)
Impersonators Run Wild on Twitter Thanks to Sabotaged Verification System (yahoo.com)