Robots are finally learning to see — and it's transforming everything

Robots have been able to move for decades. The hard part has always been getting them to see.

Last year, AI robotics startups raised $6.1 billion in venture capital funding, up 19% year over year, as machine learning engineers pioneer new technologies to make machines smarter, with the promise of bringing us into a sci-fi future where robots can operate autonomously.

“This will be…the next industrial revolution where we have basically an infinite amount of physical labor at a really low price. This will fundamentally change the world,” said Ralf Gulde, CEO and co-founder of Sereact, a Stuttgart-based startup developing AI models designed to “give a brain to robots.”

The company is developing a vision language action model, or VLAM, an AI architecture that lets robots interpret their visual environment, receive operator instructions given in natural language and translate those into actions.

Tech giants like Google, NVIDIA and Tesla, along with a host of specialized startups, are trying to crack how to build robust and reliable VLAMs, seeking to lead the next era of automation with machines that need less and less human intervention to operate.

But building this technology isn’t just theoretically challenging: it also involves processing huge amounts of data and access to considerable amounts of compute to keep improving these new robotic brains.

Visual AI learns differently

Unlike language models that are trained on fixed text datasets, VLAMs learn through continuous interaction with their environment, getting feedback from how their actions change what they see. This creates new challenges, such as delays in sensor responses, noisy visual inputs, and the need for the model to constantly adapt its behavior based on the results of its own actions.

Sereact is building its VLAM to be hardware agnostic, so that it can operate any kind of robot assigned to any task, but its first commercial focus is on “pick and pack” machines that sort and package items in warehouses and factories.

“We’re researching our own VLAM with our frontier lab, but we also have this clear, early commercialization strategy, with the narrowed down initial use case,” Gulde said, explaining how the company’s tech is already deployed across more than 100 machines with clients including automakers BMW and Daimler Truck and e-commerce logistics provider Zenfulfillment.

Another startup using VLAM technology is Berlin-based Sensmore. It’s building an operating system for clients operating heavy machinery like bulldozers, haul trucks and excavators. It retrofits the machines with hardware to let them run autonomously, and has deployed with cement and concrete company Cemex.

“Our customers’ workforce tends to be closer to retirement age, and the number of fresh starters is very limited, so labor shortages are a big reason for wanting automation. Secondly, it’s also about efficiency gains, to either produce the same output in less time or to produce more outputs in the same time,” said co-founder and CEO Max Rolf.“The third point is that these heavy machines are exposed to quite dangerous conditions, so if you can automate the machine and take out the human operator, that has safety benefits too.”

The computing power challenge

While companies like these see huge opportunity in industrial automation, they also have big data and compute loads to manage. These models are trained on information from a range of sources, including video cameras, sensors that monitor robotic movements, as well as text and photographs.

According to Sereact, its 100 deployed systems generate “hundreds of gigabytes” of data every day, and these kinds of heavy workflows have spawned a new service category in AI, with startups like Rerun, Roboflow and Voxel51 offering platforms for data management.

Sereact’s co-founder and CTO Marc Tuscher said that the company runs 100 NVIDIA H100 GPUs “full time” to constantly improve its AI model, and that this figure could increase three-fold as the company scales. He hopes that cloud providers can help companies like his by making compute “predictable and frictionless.”

“We don’t need fancy dashboards, we need guaranteed access to high-end GPUs and seamless bridging between on-prem and cloud,” Tuscher said, adding that VLAMs require low-latency compute, as the robots they control must adapt in real time to data they’re gathering from their environment. “Robotics AI depends on continuous, distributed learning from the physical world, and that only works if compute and data movement stay smooth and scalable.”

According to the company, it currently manages most of its compute needs with in-house GPUs, but that as it expands its automation use cases, it will likely require more capacity via cloud providers.

A new robotics industry

If cloud providers can support AI companies like these, the US and Europe can compete in the robotics market that has historically been dominated by Asia, said Rick Hao, founder and managing partner at London-based venture capital fund Ruya Ventures.

“Japan, South Korea and China are very advanced in robotics manufacturing, and it's much more cost effective for them to produce the hardware,” he said, adding that he’s made two stealth investments into European companies applying AI to machines made in Asia.

Hao also believes that, while many companies rely on Silicon Valley-based hyperscalers for their cloud computing, European businesses working in sensitive industries like defense will require providers that are closer to home. “That will definitely require some tech sovereignty,” he said.

The robotics industry’s lofty promises around fully automated, general-purpose machines are unlikely to come true in the short term, Hao said. Yet if companies working on VLAMs succeed, it could be the technology that justifies the huge hype surrounding AI, by transforming productivity and approaches to labor in global economies.

Robots are finally learning to see — and it's transforming everything

Keep Reading

Organ by organ, AI avatars are bridging the uncanny valley

The path to AI robots in the home is being shaped in the warehouse

Quantum computing and AI: redefining the data center