From Chatbots to World Models: The Push for Physical AI
Forget agents. The shift from intent to consequences is the biggest thing happening in AI right now.
Last week Paris-based AMI Labs—founded by Yann LeCun after leaving Meta—closed a $1.03 billion seed round at a $3.5 billion valuation.
It was Europe’s largest seed round ever.
AMI Labs are the latest in a series of startups looking to tackle a major gap in current generative AI models: they don’t know basic physics.
Obviously a bit of a no-go for any real-world deployment of autonomous AI systems.
Because when your AI system is manipulating a ten-ton mining rig or navigating a self-driving truck through a construction zone, you damn well want to make sure the system understands the consequences of its actions.
It’s a prerequisite for what the industry calls “physical AI”—AI systems manipulating objects and interacting with the real world.
As Qasar Younis, CEO of Applied Intuition, put it on Lenny’s Podcast:
The real impact of AI in the next 5 to 10 years really is going to be in farming, mining, construction. These industries need autonomy, and it couldn’t come soon enough.
Over the past 12 months, nearly $6 billion in funding has been committed to companies building AI that understands the physical world—SSI ($2B), Skild AI ($1.4B), AMI Labs ($1.03B), World Labs ($1B), Runway ($315M), Decart ($100M).
And this number doesn’t include the massive effort and capital expenditures at DeepMind, OpenAI, NVIDIA, or Meta.
From predicting intent to understanding consequences
It’s fair to say this is a big paradigm shift from the current generation models.
All your ChatGPTs, Claude, Geminis and so on work around human intent.
They are designed and trained with the idea that their will be a human present to tell them what to do, and that someone will be available to evaluate their output.
So even though frontier models like Claude Opus 4.6 are able to handle tasks with a great deal of autonomy, they operate in a world of meaning. Their domain is that of human constructs like finance, marketing, product, literature, monetary theory etc.
In other words, they inhabit a world of bits and bytes, not one made of atoms.
Your typical space mining rig, on the other hand, should know as little as possible about Dostoyevsky—and as much as possible about the physics of navigating zero-g environments.
There are some huge benefits to letting AI systems handle objects and manage complex tasks. For one, their information processing bandwidth is less constrained by the interface with human biology, allowing them to operate at superhuman speeds.
For example, in pitting AI systems against humans playing World of Warcraft, the research team capped the AI system input at 22 actions per 5 seconds and forced it to use a camera view (instead of seeing the whole map). It still won.
And even more important, these systems can be deployed in hazardous and toxic environments without the risk of loss of life—or the costs associated with bio-proofing equipment.
Closer to home, this is already playing out. Based on data collected from extensive long-running trials by Waymo and Tesla most of the 30,000 traffic-related deaths in the US could be prevented by letting autonomous systems drive.
The map is now the territory
Waymo went at it by creating meticulous maps of the cities in which their self-driving cars are deployed using LiDAR technologies.
Tesla, on the other hand, stuck to cheap sensors and general computer vision. Their map is their immediate surroundings—available on your dashboard console if you care to look.
There is this famous saying attributed to Alfred Korzybski, the map is not the territory. It’s a reminder not to confuse our representations of reality—maps, words, images etc—with reality itself.
With autonomous systems, it’s a bit different.
How they represent the world is as important as the actions it ends up taking.
Its simulation—its world model—determines how good it can plan and how well it will be able to learn useful behavior. Which in turn impacts the systems’ ability to undertake meaningful actions without negative consequences.

World models
This is not a new problem, AI researchers have been working on this for decades.
From dreaming roads in the 2018 World models paper to hallucinating Atari games for high scores, progress has been steady until it wasn’t.
On the back of the massive investments in AI of recent years, world models have gained renewed interest—and funding!—because of the massive value they can unlock.
Self-driving trucks, autonomous mining systems, self-learning farms etc.
They can replace humans in hazardous, high-value tasks at a time when the global population is aging.
These investments in building world models are now paying off, and the first world models are now commercially available—Project Genie from Google Deepmind and Marble from Fei-Fei Li’s World Labs among the most advanced ones.
In fact, I’m planning to integrate world models into my AI filmmaking studio later this month—to give creative agencies better camera controls.
But contrary to the models powering ChatGPT, Gemini, Claude and so on researchers haven’t converged on how to model worlds.
There are several approaches being pioneered right now:
Video-to-video. DeepMind, OpenAI, NVIDIA, and Runway generate interactive worlds frame by frame. Visually spectacular—but still prone to hallucination. A ball might pass through a wall.
Latent spaces. LeCun’s approach at AMI Labs. Instead of generating pixels, predict what situations mean in abstract space. Uses 50% fewer parameters, aims to give machines a real understanding of the physical world.
Native 3D. Fei-Fei Li’s World Labs generates true 3D geometry using Gaussian splatting—persistent, exportable, editable. Less suitable for autonomous systems, but great at producing actual 3D worlds you can plug into simulation engines.
Why physical AI matters in more ways than one
Regardless of which research path will eventually come out on top—and it will probably be a combination of all these approaches—physical AI is a space to keep your eye on.
Back in 2024 I called LLMs the “steam engines of knowledge work”.
I was wrong in more ways than one.
First of all, as a technology are more akin to personal transportation—the skill of the driver matters as much as the machine you use (hence my AI operators program),
And second, because knowledge work, as economically important as it is, is not nearly as fundamental to our lives as the industry verticals in which physical AI will have an impact—agriculture, construction, mining, space exploration and so on.
The autonomous systems of the future will need to be able to do more than just divine human intent. They need to be able to understand the consequences of their actions.
The shift from intent to consequences isn’t just happening in robotics and self-driving cars. It’s happening in how businesses use AI too. The companies that treat AI as a chatbot will get left behind by those that build systems that understand their world—their workflows, their data, their decision chains. AI Operators is a 4-week, 1-on-1 program where we build that system together. Reply to this email or DM me on Substack to get started.
Last week in AI
As discussed, Yann LeCun’s new startup AMI Labs closed a $1.03 billion seed round at a $3.5 billion pre-money valuation. The company is building world models based on his Joint Embedding Predictive Architecture (JEPA): systems that learn abstract representations of physical reality through sensors and cameras rather than predicting text token-by-token.
Microsoft launched Copilot Cowork on March 9, an enterprise AI agent that autonomously executes multi-step tasks across Outlook, Teams, and Excel. It is built on Anthropic’s technology from Claude Cowork. Currently in limited Research Preview with a Frontier program expanding in late March. I wrote about Microsoft’s Copilot struggles last August—bringing in Anthropic’s agentic models is effectively an admission that the original approach was not working.
OpenAI announced its acquisition of Promptfoo on March 9, an open-source AI security startup. Promptfoo’s automated red-teaming, prompt-injection detection, and jailbreak identification will integrate into OpenAI Frontier, the enterprise agent platform launched in February.
Andrej Karpathy released autoresearch, an open-source system where AI agents autonomously run machine learning experiments overnight on a single GPU. The setup is deliberately minimal: roughly 630 lines of PyTorch code, a Markdown prompt file, and an external LLM (like Claude) that proposes changes to training code, runs 5-minute experiments, evaluates results, and commits improvements.
NVIDIA’s GTC 2026 kicked off today in San Jose with Jensen Huang’s two-hour keynote at 11 AM PT. The expected headliner is the Rubin GPU architecture, which promises up to 288GB of HBM4 memory and 5x dense floating-point performance over Blackwell, purpose-built for trillion-parameter models and inference-heavy agentic workloads.


Check out upciti.com - in the real world now