When AI Controls Machines, Small Uncertainty Becomes Physical Reality
"We have watched AI transform software - chatbots, assistants, code generators. But a quieter, more consequential revolution is underway: AI is moving from screens to the physical world. And when it does, the stakes change entirely."
Full webinar video at the end of the blog.
This is the central argument Faseeh Ahmad, robotics engineer and AI researcher at System Verification, made at last week’s webinar. With a PhD in robotics and AI from Lund University and hands-on experience with industrial robot arms, autonomous systems, and the Boston Dynamics Spot robot, Ahmad brought a rare combination of academic depth and real-world engineering perspective to a question that's becoming impossible to ignore: how do we ensure reliability when AI controls physical machines?
From Rules to Probability: What Actually Changes
Traditional automation is elegantly predictable. A PLC controller takes defined inputs, applies rules set by engineers, and produces consistent, traceable outputs. If something breaks, you can follow the logic backward and find the cause. The system is, in a word, deterministic.
Introduce an AI module, and the architecture shifts fundamentally. Instead of rules, you have billions or even trillions of weighted parameters. Instead of defined outputs, you get probabilistic guesses. The system no longer tells you why it made a decision - it just does.
The non-determinism problem
Ask ChatGPT the same question twice, and you will likely get different answers. That is non-deterministic behaviour. In software, it is a quirk. In a robot arm operating near a human worker, it is a safety issue.
But the deeper shift is not just about how decisions are made – it is about where systems now operate. AI gives robots the flexibility to leave controlled factory environments and enter the real world: warehouses, roads, homes. And in doing so, every assumption that made traditional automation safe simply dissolves.
Failures in the Physical World Are Different
Ahmad illustrated this with a task from her PhD research: the "peg-in-hole" problem - a simplified model of assembling a piston engine. On paper, it is straightforward. In practice, the robot faces a cascade of potential failures: a block obstructing the hole, a peg dropped mid-grasp, a misplaced tool. None of these is a planning failure.
They are execution failures - problems that only emerge while the task is actually happening.
These failures have four qualities that make them particularly difficult to handle:
01 They cascade
An error at step 3 may not surface until step 8, by which point it has compounded through the system.
02 Root cause is murky
Is it the sensor? The grasp force? The lighting? Hardware and software interact in ways that resist clean diagnosis.
03 They are inevitable
No AI system is perfect. The question is not whether failure will happen - it is whether the system can recover.
04 Physical consequences
Unlike software bugs, these failures can damage equipment, halt production lines, or injure people.
Ahmad demonstrated a live test using an ABB Yumi robot where AI autonomously detected obstacles, diagnosed failed grasps, adapted to a closing drawer, and even discovered hidden objects mid-task, all without human intervention. It was impressive. It also illustrated exactly how many things can still go wrong.
This Is a QA Crisis
Classical software quality assurance rests on a foundation of determinism: fixed inputs, reproducible bugs, and clear validation paths. That foundation does not exist in AI-driven physical systems.
You are now dealing with distribution shifts (the training data never fully represents the real world), probabilistic and non-repeatable outputs, implicit behaviours baked into billions of model parameters, and an open environment where unknown inputs are guaranteed to arrive.
The consequences are not abstract. Ahmad pointed to a pattern already visible in deployed systems: Waymo taxis getting stuck in looping reversal standoffs, Amazon warehouse robots clustering in unintended locations, humanoid robots losing balance on stage, and drone crashes despite sophisticated onboard AI. These are not failures of ambition; they are failures of reliability engineering.
Four Principles for Building Reliable AI-Physical Systems
1 — STRUCTURE AROUND YOUR LEARNED COMPONENTS
You cannot write unit tests for billions of neural network weights, it is computationally infeasible and conceptually wrong. Instead, test at the behaviour level: define what you expect the AI module to do based on its role in the system and write tests against those expectations. Abstract the black box; test its interface.
2 — RUNTIME MONITORING IS NON-NEGOTIABLE
Static tests catch known problems. Runtime monitoring catches everything else. Continuous monitoring, integration checks, and post-deployment validation are not optional extras; they are the mechanism by which an unpredictable system remains observable.
3 — BUILD IN RECOVERY FROM DAY ONE
Ahmad is direct on this point: recovery is not an afterthought. A system that can detect its own failures and attempt resolution, without waiting for a human to intervene, dramatically reduces downtime and cost. Designing that recovery requires knowing your failure modes in advance, which loops back to the QA question: what can go wrong, how, and when?
4 — SYSTEM-LEVEL TESTING, NOT JUST MODULE TESTING
Test the whole system together, against real-world requirements. Individual modules that each pass their tests can still produce a system that fails - because failure in physical AI often lives in the interactions between components, not within them.
The reliability gap
Achieving Level 5 autonomous driving and making Level 5 autonomous driving reliable are two entirely separate problems. Ahmad argues the second is harder than the first, and that conflating them is one of the field's most dangerous assumptions.
The Road Ahead: Hybrid Control and Safety Layers
Ahmad's vision for the future is not to abandon AI-driven physical systems - it is to engineer them properly. He sees the most promising path in hybrid control architectures: traditional deterministic modules handling well-defined tasks, with an AI core managing learning, adaptation, and environmental understanding. Crucially, wrapping that AI core in an explicit safety layer, a monitoring system that validates behaviour and enforces guardrails.
On home robots: optimistic, but patient. Startups like Physical Intelligence are tackling domestic dexterity, and the hardware is improving. But reliable home robots, ones you would trust around children, elderly relatives, and fragile objects, are likely a decade out. The gap is not AI capability; it is AI reliability, combined with the physical dexterity limitations of current humanoid platforms.
Check out the full webinar: