- Chaos Theory
- Posts
- 🥟 Chao-Down #163 Microsoft patents an AI backpack, Twitter (X) updates privacy policy to allow public use of data to train AI, The growing number of websites blocking OpenAI's GPTBot and web scrapers
🥟 Chao-Down #163 Microsoft patents an AI backpack, Twitter (X) updates privacy policy to allow public use of data to train AI, The growing number of websites blocking OpenAI's GPTBot and web scrapers
Plus, scientists develop an AI that can predict the smell of chemicals from their structures.
OpenAI recently unveiled GPTBot, its web crawler that collects data to train their large language models like GPT4. GPTBot will search through the Web and scrape publicly accessible data, excluding content like paywalled material and sensitive information.
One key feature of the tool is that websites concerned about data scraping have the option to block the bot through methods like IP blocking and modifying the robots.txt file.
According to data from Originality.AI, the world's top websites are making use of that feature and are actively blocking GPTBot and similar web crawlers, with now over 18.6% of the world's top 1,000 websites getting involved.
Don’t expect this trend to stop. As AI advances, content and website owners are becoming more sophisticated to make they are protected from the future.
-Alex, your resident Chaos Coordinator.
What happened in AI? 📰
AI predicts chemicals’ smells from their structures (nature.com)
This S.F. AI firm is leasing the entire former Slack HQ (sfchronicle.com)
US should use chip leadership to enforce AI standards, says Mustafa Suleyman (Financial Times)
X’s privacy policy confirms it will use public data to train AI models (TechCrunch)
Websites That Have Blocked OpenAI’s GPTBot - 1000 Website Study – (Originality.AI)
Microsoft filed a patent for an AI backpack straight out of a sci-fi movie (ZDNET)
Always be Learnin’ 📕 📖
How AI voice ordering through DoorDash can be applied to all businesses (Substack)
Building Performant RAG Applications for Production - LlamaIndex 🦙 0.8.19 (gpt-index.readthedocs.io)
How (not) to get people to download your app 🫣 (builtformars.com)
Projects to Keep an Eye On 🛠
langfuse/langfuse: Open source observability and analytics for LLM applications (Github)
stanfordnlp/dspy: DSPy: The framework for programming with foundation models (Github)
Dataherald/dataherald: Dataherald - Query your structured data in natural language. (Github)
The Latest in AI Research 💡
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory (arxiv)
Flexible Techniques for Differentiable Rendering with 3D Gaussians (leonidk.com)
DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing (arxiv)
The World Outside of AI 🌎
China's urban youth unemployment crisis (cnbc.com)
X, formerly Twitter, to collect biometric and employment data (BBC News)
Blur, OpenSea, other marketplaces fight over a shrinking NFT market (axios.com)
How to live longer than your parents (telegraph.co.uk)
Vaccinating Adults Should Be Easier (The Atlantic)