• Chaos Theory
  • Posts
  • 🥟 Chao-Down #144 The fast rise of AI-ready data centers, Alibaba open-sources AI model in bid to challenge Meta, The political biases prevalent in large language models

🥟 Chao-Down #144 The fast rise of AI-ready data centers, Alibaba open-sources AI model in bid to challenge Meta, The political biases prevalent in large language models

Plus, OpenAI now lets you block its web crawler from scraping your data

In their attempt to improve their AI models, OpenAI has publicly released a new web crawler to improve future AI models like GPT-4 and the eventual GPT-5.

According to OpenAI, the bot should strictly filter out any paywall-restricted sources, information that violates OpenAI’s policies, or sources that gather personally identifiable information.

Website and content owners are able to opt-out of having their content scraped by modifying their robots.txt file with the line:

User-agent: GPTBotDisallow: /

To me, while it’s great that one actor (i.e. OpenAI) has given us these guidelines, what about the N number of other developers, hackers, or entities who may not have such good intentions?

It just goes to show that it’s safe to assume that anything you put online will likely be used in some form of training set for a future AI.

-Alex, your resident Chaos Coordinator.

What happened in AI? 📰

Alibaba launches open-sourced A.I. model in challenge to Meta (CNBC)

Salesforce Einstein Studio lets you bring your own model, starting with Amazon SageMaker (TechCrunch)

AI-Ready Data Centers Are Poised for Fast Growth (WSJ)

AI language models are rife with political biases (MIT Technology Review)

Now you can block OpenAI’s web crawler (The Verge)

Microsoft’s AI Red Team Has Already Made the Case for Itself (WIRED)

Always be Learnin’ 📕 📖

Catching up on the weird world of LLMs (simonwillison.net)

Protobufs Explained, and why Google, Apple, and LinkedIn use it over JSON (devmoh.co)

How Amazon tackles a multi-billion dollar bot problem (Substack)

Projects to Keep an Eye On 🛠

LISA: Reasoning Segmentation via Large Language Model (Github)

umd-huang-lab/perceptionCLIP: Code for our paper "More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes" (Github)

ErwannMillon/Color-diffusion: A diffusion model to colorize black and white images (Github)

The Latest in AI Research 💡

A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards (arxiv)

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models (arxiv)

Wider and Deeper LLM Networks are Fairer LLM Evaluators (arxiv)

The World Outside of AI 🌎

Life-Threatening Heat Waves Are Triggering Covid-Like Shutdowns (Bloomberg)

Google offers on-campus hotel 'special' to lure workers back in (CNBC)

Unlimited miles and nights: Vulnerability found in rewards programs (Ars Technica)

Saudi Arabia has the most profitable company in the history of the world, and $3.2 trillion to invest by 2030. Who will say no to that tidal wave of cash? (Fortune)

The most misunderstood concept in psychology: What are boundaries? (The Atlantic)

Why young employees are not buying into the American Dream (Axios)

One Last Bite 😋

Clip from the Humans of AI Podcast: “Tara Walker: Blending Math and Music to Building Vector Databases.” Watch the full interview here.