Podcast Notes /// Sergey Levine - Building LLMs for the Physical World - [Invest Like the Best, EP.465] | Invest Like the Best with Patrick O'Shaughnessy

Sergey Levine is a UC Berkeley professor and co-founder of Physical Intelligence who is building a general-purpose brain for any robotic system. He explains why training robots on a wide variety of tasks is the most scalable way to bring them into our homes and workplaces. This approach aims to solve the hardest problems in robotics by giving machines the common sense needed to navigate the unpredictable physical world.

Key takeaways

Developing general robotic foundation models may be easier in the long run than building narrow, specialized systems for specific tasks.
True robotic generalization often looks mundane, such as picking up plates in an unfamiliar kitchen, but it is far more difficult than creating specialized demos in controlled environments.
A foundation model for physical intelligence could trigger a Cambrian explosion in robotics by allowing people to build applications without having to solve the core intelligence problem themselves.
General intelligence enables the development of robots at extreme scales, from massive machines to microscopic medical tools, that are not restricted by human form.
The future of robotic surgery involves moving beyond teleoperation so that machines are no longer limited by the speed or dexterity of a human controller.
The primary challenge in robotics is creating cost-effective systems that can handle rare long-tail scenarios without needing massive new datasets for every task.
Multimodal language models provide a path to giving robots common sense by allowing them to leverage general knowledge to navigate situations they have never physically experienced.
The most powerful AI systems merge the broad prior knowledge of generative models with the superhuman optimization found in reinforcement learning.
Sophisticated software can overcome basic hardware limitations. For example, simple cameras can function as touch sensors by visually tracking how objects deform.
Robotics is moving from a physical bottleneck to a reasoning bottleneck, where the challenge is no longer how the robot moves, but how it interprets the scene to choose the next step.
Common sense in robotics is the opposite of muscle memory. It is the ability to apply abstract knowledge or facts to a specific physical situation to make a correct decision.
True generality in robotics comes from systems that can improve autonomously through their own experience rather than relying on human engineers or manual data labeling.
Tasks like making espresso or folding laundry serve as difficult challenges to push the limits of general-purpose robots rather than being the end goal itself.
The true test of robotic intelligence is performing mundane human tasks like washing a greasy pan or using a plastic bag, which are paradoxically difficult for machines.
Robots can surpass human speed by using reinforcement learning to identify and remove the mental processing pauses that humans naturally take during complex tasks.
True physical intelligence is agnostic to the body. A single foundation model should be able to control any form factor by treating every machine as part of the same general problem.
The bitter lesson suggests that AI reaches its greatest potential when researchers stop trying to program human logic into the machine and instead allow it to learn entirely from data.
Moravec's paradox shows that tasks humans find most natural, like physical caregiving, are the most difficult for robots because we are highly evolved for physical intelligence.
Robotic hardware costs have plummeted from $400,000 to roughly $3,000 per arm in just one decade.
Robotics faces an activation energy problem where robots must be useful enough to deploy before they can gather the real world data required for large scale improvement.

Defining physical intelligence through general foundation models

03:36 - 05:21

Sergey Levine defines the mission of Physical Intelligence as creating robotic foundation models that can control any physical system to perform any task. Just as language models are evolving to handle any task expressed in words, these new models aim to operate any physically actuated device. There is a strong belief that pursuing full generality is actually simpler in the long run than focusing on narrow, specific applications.

Part of the thesis of this company is that we believe that doing it at the full level of generality might actually in the long run be easier than trying to special case very specific narrow application domains.

This approach follows the path of natural language processing. Previously, experts built specialized systems for tasks like translation by focusing on specific linguistic differences. However, general language models eventually took over because they could leverage much broader data sources. By learning from diverse, weakly labeled data, these models develop a deep understanding of the world. This foundation makes it much more effective to build specific applications on top of a general system.

Building a foundation model for physical intelligence

05:21 - 10:18

Robotics faces a unique challenge because it lacks a dataset as massive as the internet. To overcome this, the focus must shift from training specialized robots for single tasks, like folding laundry, to building models that understand physical interaction. Sergey suggests that humans master new skills quickly because we intuitively grasp how the world works. If a model can draw data from many sources and different types of robots, it gains a foundational understanding of physics. This makes it much easier to deploy new applications on top of that platform.

The point of generalization is that it does something relatively mundane that any human could do, but it does it in any situation. We had some demos where we showed our robot cleaning kitchens. It is cool, but if you watch an individual video out of context, it is just picking up plates. Anyone can pick up plates, except that we just put it into that home just for that demo and it never had training data from that setting.

Creating a general purpose foundation model could unlock a creative explosion in robotics. Sergey compares this potential shift to the rise of personal computers. In the past, people had to build a massive technical stack to do anything with a robot. A foundation model would provide basic functionality that others could prompt or fine tune. This allows individuals to experiment with unique designs, such as robots with five arms or systems that hang from the ceiling, without needing to solve the core intelligence problem first.

While humanoid robots capture the public imagination, they are only one possible form factor. The intelligence needed for a robot to be useful is actually similar across different bodies. Whether it is a bulldozer, a humanoid, or a swarm of quadcopters, the basics of causality and movement remain the same. Sergey believes we should not tackle intelligence in the context of one specific body. Instead, we should build general systems that can adapt to any tool needed for the job.

The fundamentals of how you interact with objects, how things move in the world, how causality works, that is all conserved for all of these different systems.

Exploring robotics beyond human scale and control

10:18 - 11:01

General intelligence allows for the creation of machines that go far beyond the human form. Robots could be designed at scales that are either very big or incredibly small. This flexibility opens up exciting possibilities in fields like medicine and surgery. Currently, robotic surgery relies heavily on teleoperation, where a human operator must control the machine in real time. This limits the robot to the dexterity and speed of a person.

In the long run, I think there's lots of really exciting applications in medicine and surgery where we not only might in the long run not be limited to robots that look like humans, but we might not be limited to robots that can even be controlled by humans.

Sergey suggests that the future of robotics lies in breaking these human-centric constraints. By moving past the need for direct human control, we can develop systems that function in ways and at scales that humans cannot manage themselves.

The evolution of common sense in robotics

11:02 - 13:57

End-to-end control for robotic systems is an old concept. Early autonomous driving systems like ALVINN used neural networks to drive on highways as far back as the late 1980s. Despite this long history, robotics has struggled with cost effectiveness. Training a robot for a specific application like washing dishes usually requires a massive amount of data. This process becomes prohibitively expensive if it must be repeated for every new task. The goal is to move toward general purpose models that require less data for each new challenge.

Multimodal language models are really good at pulling in knowledge and trying to articulate that knowledge. They are not very good at grounding that knowledge in physical situations, but they know stuff.

A major breakthrough involves handling long tail scenarios. These are unusual situations that a robot may have never encountered. Sergey points out that humans use common sense to navigate these moments. If a driver sees a sign about a gas leak, they can figure out what to do even if they have never seen that specific sign before. Robots have historically lacked this common sense. Modern multimodal language models offer a solution. These models contain vast amounts of knowledge that can help robots handle rare events. The current challenge is grounding that abstract knowledge in physical reality so the robot understands its own context and capabilities.

The evolution of robotic learning systems

13:58 - 14:46

The history of robotic learning is marked by several key transitions. End-to-end learning systems in the 1980s represent an early milestone. Later, the development of deep reinforcement learning in the early 2010s provided a path for machines to surpass human performance levels. Sergey explains that this capability is vital for the future of robotic systems.

Deep reinforcement learning gives us a way to go beyond human level performance, which I think will be essential for robotic systems.

Recent developments in multimodal large language models are also significant. These models bring common sense to robotic control. While it is early to define every major milestone, Sergey expects the next few years to produce even more transformative advances.

Combining generative AI and reinforcement learning for robotics

15:53 - 22:17

Sergey started working in robotics in 2014 at UC Berkeley. His primary interest is building AI systems that improve the more they perform a task. Ideally, a system should have no limit to how much it can improve, allowing it to eventually master any skill. Initially, Sergey explored a blank slate approach where robots practiced specific skills from scratch. This worked in controlled environments but struggled in the open world. Even when training twenty robots at once to learn collectively, they became experts in specific tasks but could not handle unusual edge cases.

The thing that I have always wanted to really figure out is how to get AI systems that get better and better the more they do things. If you can have a system that just keeps getting better and there is no limit, then it can master all the skills you want it to do.

The current goal is to combine two major breakthroughs in AI: generative models and deep reinforcement learning. Generative AI, like large language models, can reproduce human behaviors like writing or drawing. Deep reinforcement learning, like AlphaGo, can discover strategies that humans never even thought of. By merging these, robots can use the vast knowledge found on the internet while also learning to perform better than humans through direct practice.

At Physical Intelligence, Sergey uses Vision-Language-Action models. These are models trained on text and web images before being adapted to robot data. To help robots handle unexpected situations, they use a chain of thought process. When a robot is told to clean a kitchen, it first thinks about the steps it needs to take before moving. This allows it to use common sense derived from its internet training.

A good learning method can actually compensate for deficient sensing fairly well. The wrist cameras are essentially a touch sensor in disguise because you can see local deformations when you touch something.

Data is the key to scaling these systems. Rather than trying to calculate the exact amount of data needed, the strategy is to make robots useful enough to be deployed. Once they are out in the world, they can collect their own data, similar to how Tesla uses its fleet of cars to gather information. This creates a flywheel effect where the robot becomes more capable as it gathers more experience.

The shift from physical dexterity to semantic reasoning

22:18 - 27:29

Progress in robotics is moving faster than expected in areas like dexterity. Systems are now able to perform complex physical behaviors without needing specialized engineering for each task. These models generalize well across different robot designs, adapting to various multi-fingered hands or different numbers of joints without requiring a change to the underlying code. The bottleneck in robotics is shifting from physical capability to semantic reasoning. While programming a robot to pick up an object used to be the primary challenge, machine learning makes this straightforward if data is available. The real difficulty now lies in domains that require common sense and reasoning across different levels of abstraction.

Programming something by hand to pick up any cup anywhere, that's difficult. Getting a machine learning system to do it, if you have data for it, it's actually not that difficult. And I think increasingly what we'll see is a shift where domains, where collecting data is straightforward, they actually end up falling into the easy bucket over time.

Common sense in a robotic context acts as the opposite of muscle memory. While muscle memory is an automated physical response, common sense involves applying semantic inferences and outside knowledge to a specific physical task. It is the ability to take a fact learned elsewhere and apply it to a new environment to make the right decision. This transition toward higher-level reasoning allows for a form of robotic coaching. Instead of needing more manual control data to fix a failure, researchers can now improve robot performance simply by labeling their experiences with semantic commands. This suggests that robots are becoming more limited by their ability to interpret a scene than by their physical ability to move within it.

The bottleneck had actually shifted from the lowest level, meaning the robot's ability to physically do the task, to this middle level, where now the system is more bottlenecked by its ability to interpret the scene and select the correct next step, which can be supervised with language. And that's a big deal because now that means that someone can literally talk to the robot. It's coaching, basically.

The challenges of bringing robots into the home

27:29 - 31:26

The biggest hurdle to having a robot in every kitchen by 2050 is the long tail of challenges involving human interaction. Technology must reach a high level of capability, but people also need to feel comfortable with it. Autonomous cars faced a similar struggle. Early versions were imperfect, and society had to decide if that level of imperfection was acceptable.

There are some tasks for robots where people will be comfortable with something that's not perfect, something that needs to learn from its mistakes. Are you comfortable with occasionally breaking your dishes? Maybe in a few years it will stop breaking those dishes, but maybe in the meantime, it's not quite there.

A home is a chaotic environment compared to a controlled space like a hotel or a restaurant. In a house, anything can happen. A robot must be able to infer what is going on and adapt quickly. It does not have to be perfect in every single task, but it must always do something sensible that humans find acceptable. This is the hardest part of the problem because physical devices affect the world around them.

To solve this, Sergey explains that the focus must be on generality. The best systems are those that can improve themselves. Hand-designed controllers are limited because they require a human engineer to make changes. Systems that rely on human labels are better but still have limits. The most advanced systems learn directly from their own experiences.

A system that learns autonomously from data that it gathers through its own experience is even more general because you don't need the human labelers. The key is this generality, particularly with respect to improvement.

This approach avoids getting stuck on specific hardware choices, like the number of cameras or sensors. Sergey remains agnostic about the exact hardware or whether the system will use a language model in the long run. The primary goal is to build a model that learns from diverse data and adapts to new situations on its own.

The dichotomy between simulation and real world data in robotics

31:27 - 32:44

A major unanswered question in robotics is the division between using real-world data and simulated data. This is a controversial topic where different domains use completely different strategies. Humanoid robots often perform acrobatics using a pipeline that relies almost entirely on simulation. These robots often use zero real-world data to learn their movements.

It is surprising that in these two robotic domains the dominant approaches look so different. It may be that one will win out and there is a particular approach that can handle everything in the long run. Or maybe there's some sort of synthesis of these ideas that's important.

In contrast, robotic manipulation typically uses the opposite approach. These systems use large amounts of real-world data and massive foundation models while using very little simulation. It is unclear if one method will eventually dominate all of robotics or if the two approaches will merge into a single synthesis. Sergey notes that while he has strong opinions on the best path, the research community is still looking for a definitive answer.

Prioritizing utility while pursuing cool challenges

32:44 - 33:42

The development of robotics often faces a choice between creating something visually impressive and something genuinely useful. While a robot performing a backflip is technically impressive, its practical application remains limited. Sergey explains that his strategy focuses on utility first. He makes decisions based on what will advance a general, broadly applicable robotic foundation model.

The strategy we've taken is subject to the constraint that it's useful. Make it as cool as possible. We make decisions first and foremost based on our assessment of what will drive the tech forward towards this truly general, broadly applicable robotic foundation model.

To test these general systems, Sergey and his team use demanding tasks that also happen to be engaging. Challenges like making espresso or folding laundry were not the primary goals. Instead, they serve as high-stress tests to see how far the technology can be pushed. This ensures the robots are not just performing tricks. They are developing skills that translate to real-world value.

Testing robot generality through everyday challenges

33:42 - 35:34

The true test for robots is not running a track or jumping hurdles. Instead, the real challenge lies in mundane tasks that humans find effortless. A list of everyday tasks, such as washing a greasy frying pan or using a plastic bag to pick up dog waste, represents the frontier of robotics. These tasks are paradoxically easy for people but historically impossible for machines.

Things that people don't find particularly challenging, but that no current robotic system can do. And he listed maybe a dozen of these things.

Sergey and his team used this list of difficult tasks to test their internal model training systems. They did not develop specific code for each task. Instead, they wanted to see if their general process for onboarding new skills could handle variety. They successfully solved almost every challenge on the list.

The only failures were due to physical hardware limits rather than software. The robot grippers were too large to fit inside a dress shirt sleeve to turn it inside out. The fingers also lacked the strength to peel an orange, requiring a tool instead. This success highlights the power of generality. When a system is designed to be general, it can learn a wide range of complex activities without needing custom-built solutions for every new scenario.

One thing that I think is important to keep in mind is we didn't develop anything special for this. We literally use this as a test of our task onboarding process.

Developing general physical intelligence for robots

35:34 - 40:52

Robots have the potential to surpass human physical ability by eliminating the processing bottlenecks that slow us down. When a person performs a delicate task like plugging in an ethernet cable, they often pause to align the pieces and parse the situation. Sergey explains that it is relatively straightforward to identify these pauses in robot demonstrations and remove them. Through reinforcement learning, a robot can learn to succeed at a task much more quickly and efficiently than the human who taught it.

It turns out to be pretty straightforward to go in and find all those pauses and remove them. You can speed things up further so you can get to a task where a person demonstrates what it means to succeed. And then you can have the robot practice the task and succeed in the same way, but a lot more quickly, a lot more efficiently.

In the past, the physical shape of a robot was constrained by the difficulty of the AI challenge. Changing a robot form required complex work to characterize the dynamics of the new system. However, robotic foundation models could change this by allowing researchers and hobbyists to experiment with any form factor. If the software is general enough, a user could build a robot in their garage and get it moving immediately. This shift mirrors the evolution of computers, which moved from a few fixed forms to being embedded in everything from phones to refrigerators.

Sergey believes that physical intelligence should be agnostic to the specific body being controlled. He points to studies where monkeys using tools showed brain activity at the tip of the tool rather than their own hands. This suggests that the brain treats tools as a literal extension of the body. In the same way, a single foundation model should be able to control a humanoid, a car, or a bulldozer. Rather than solving separate problems for every type of machine, the goal is to solve the one general problem of physical intelligence.

There isn't like a humanoid problem and a car problem and a bulldozer problem and a robot bolted to the table problem. There is one problem and if you solve it as full level of generality, that is really, really powerful.

The most significant changes in the early days of advanced robotics will likely come from widespread experimentation. Just as the accessibility of large language models allowed for a burst of creative prototypes, lowering the barrier to entry in robotics will empower smart people to rapidly iterate on new ideas. Sergey emphasizes that physical intelligence thrives on this kind of engagement and open collaboration.

The evolution of learning in robotics

40:52 - 47:16

The robotics community has experienced a major shift in how it views machine intelligence. In the early days, researchers debated whether learning even had a place in robotics. Traditional engineering favored programming specific knowledge of physics and safety into a robot. However, the field has increasingly internalized the idea that robots do not necessarily need a physics simulator to plan. Instead, they can use learning systems to figure out the world on their own. This leads to a concept known as the bitter lesson.

The bitter lesson says that you should not program the machine to think the way you think it should think, but you should let it learn from data. And that is not a universally accepted idea. I think there is good arguments against it, but I think that in the long run, if we want that generality, then we need it to primarily be learning from data.

One fascinating aspect of this intelligence is compositional learning. Sergey explains this using the example of a language model writing a sandwich recipe in the International Phonetic Alphabet. Even though the model has likely never seen free form text in that specific alphabet, it understands the components of language well enough to combine them. This ability to combine and mix skills to solve new problems is a goal for robotics as well.

Despite progress in software, physical tasks often highlight Moravec's paradox. This principle suggests that things humans find easy, such as physical interaction and caregiving, are actually the most difficult for AI. Tasks like changing a diaper or caring for the elderly are complex because they involve high stakes and subtle physical cues. We are so evolved for these interactions that we often underestimate their difficulty. This physical intelligence is deeply embedded in how we think. We use physical analogies for abstract ideas, such as saying a company has momentum or describing subatomic particles as having spin. These analogies are not just for explanation. They actually help us make sense of the world and lead to new conclusions.

The delicate balance of research and progress

47:18 - 51:29

Scientific progress in machine learning and robotics often looks like a series of major milestones. However, it is actually the result of many people trying many things. Even failures and heated controversies can be very instructive. Sergey points out that while some individuals have a history of hitting home runs, the community as a whole drives progress by pushing through bad ideas to find the good ones.

Research differs from engineering because the primary goal is to find an answer to a specific question. This often requires cutting corners and making a very delicate choice about persistence. Sergey notes that the hardest part of research is deciding whether to keep hammering at a problem or to pivot to something new.

One of the most delicate decisions in research is when do you try new things? Or is when do you stick with what you're already trying to? That's very, very delicate. It's very, very hard to figure that out. And if you get it wrong, then you can miss something really remarkable. If you get it wrong and you don't stick with something for long enough, you might be right there, you might be about to get to the answer and then you stop just short of it.

There is no single personality type that makes a great researcher. Some people are driven by a pure love for novelty and cool ideas, while others are motivated by the desire to solve a specific, practical problem. Sergey has worked with effective researchers who do not care if their technology is useful as long as the idea is new. The only real constant is a deep passion for the work.

I've worked with people that were remarkably effective, that are just driven purely by the desire for novelty. They don't give a damn about what their technology does. They don't give a damn about whether it's useful. They just want cool new ideas. I've also worked with other people that really want to solve a particular thing.

While manufacturing and scaling are important, the current priority is reducing uncertainty. General purpose AI tools like foundation models can help figure out the software and functionality first. This ensures that when a robot is finally produced at scale, there is a high level of confidence that the technology will actually work.

The collaborative future of human and robot labor

51:30 - 53:51

Preparing for the rise of robotics is complex because the technology is still evolving in fundamental ways. A major uncertainty involves the balance between human demonstrations and reinforcement learning from autonomous data. Companies might need to prepare for heavy teleoperation where humans guide robots through tasks. Alternatively, they might only need a few demonstrations while the robot learns mostly on its own. The correct business approach depends heavily on which method becomes dominant.

Will the robots rely more on demonstrations or on reinforcement learning from autonomous data? We are working on both of those things and they are clearly both important. But how somebody should prepare for the technology will be pretty different.

For now, businesses should focus on the economics of labor. Coding tools serve as a useful template for how AI might change the workforce. These tools did not eliminate the need for software engineers. Instead, they increased the productivity of individual workers. Sergey believes robotics will follow a similar path. It will not be a case where a humanoid robot enters a workspace and all the people leave. Instead, there will be a collaboration. Some tasks will be done by robots, while others will require a person to make the robot more productive.

The role of robotic demos and finding business utility

53:51 - 56:10

The new version of the Boston Dynamics Atlas robot stands out because it balances human-like agility with engineering choices that allow for a superhuman range of motion. While some critics point out that these robots have yet to provide direct value to customers, Sergey believes these demonstrations serve a vital purpose. They illustrate the technical challenges that must be overcome and provide a clear vision for what the technology will eventually achieve.

Demos that are used correctly in service to a mission can provide people with an illustration of what to expect. And they also provide a challenge. You just have to be honest in setting up that challenge.

The path to a commercial breakthrough remains complex. While the Roomba is the most successful consumer robot to date, the industry is still searching for the next major application. Current work focuses on prototyping different tasks to see where things go wrong in real-world scenarios. This process helps narrow down the space of possibilities for future products. Sergey expects that 2026 will be a key year for experimenting with these different business endpoints and data collection strategies.

The falling cost of robotics hardware

56:10 - 57:06

The cost of robotics hardware has dropped significantly over the past decade. Ten years ago, a standard robot like the PR2 cost approximately $400,000. When Sergey started his lab at UC Berkeley, the price of a robot was around $30,000. Today, individual robotic arms can cost as little as $3,000. This massive price reduction makes general purpose robotics much more practical for technical development.

When I started working in robotics about a decade ago, I worked with a robot called a PR2, which I believe had a cost of about $400,000. Now each arm on this thing is maybe a tenth of that.

This shift is not the result of a single technology. It is driven by improvements in both hardware and software. Modern low cost arms lack the extreme precision required by traditional industrial control methods. However, new software approaches allow these less precise tools to be useful. This combination of cheaper hardware and smarter software is a major milestone for the field.

The challenge of staying informed in robotics research

57:06 - 58:06

Research papers are the primary source of information in technology. However, they are not very accessible. These papers assume the reader already understands the history of the field. This makes it hard for most people to find the real signal behind the results.

Research results are intended for an audience that already understands the starting point from all the past research results.

Public demos and social media videos can also be misleading. Sergey notes that these artifacts show the edge of capability. They do not provide a clear sense of the true state of the technology. To understand what a demo really means, one must look closer. Sergey explains that the only way to find the truth is often to talk to the researchers directly. This is the reality of how science works, even if it is not a perfect situation.

The bootstrap challenge of robotic data collection

58:06 - 1:00:07

The timeline for when robots will become truly useful remains uncertain. This is largely due to a bootstrap challenge. Robots must reach a certain level of capability before they can be deployed in the real world to collect the massive amounts of data they need to improve. Sergey describes this as an activation energy problem. Once robots are useful enough to be in open world settings, they can collect data at scale. Getting to that point is a sudden and difficult step to predict.

This is something where there's a bootstrap challenge, getting to a particular level of usefulness so that robots can be deployed, so they can do useful tasks, so they can start collecting data from open world settings at scale. Because that's such a sudden event, getting past the activation energy, I think there is a lot of uncertainty about the timing of that.

There are also different strategies for gathering this data. Some methods use teleoperation, where humans control the robots, while others rely on autonomous systems or coaching. Each approach changes the timeline for deployment. People often assume that any data is good data for machine learning. Sergey warns that this is often a mistake. It is not enough to just collect videos of people performing tasks. You must have the specific kind of data that matches the technology you are trying to build.

One assumption is machine learning requires data. So let me just figure out something that will collect data. That's not often the best assumption because you need the right kind of data. Maybe some data is easy to get videos of people doing something, but that doesn't mean that's the right kind of data.

Solving mid-level reasoning and empowering researchers

1:00:07 - 1:04:37

The next major challenge in robotics is understanding mid-level reasoning. While we have a good sense of how to acquire low-level physical behaviors, making them generalize requires common sense knowledge. Large Language Models make certain representations convenient, like turning text into more text. However, an embodied system might need to think spatially or semantically. The internal thinking process for a robot might look very different from a standard AI model.

Sergey views himself as optimistic compared to established robotics researchers but pessimistic compared to robotics entrepreneurs. The field of robotics has a long history with very few AI successes. Most robots doing useful work today still use technology from the 1980s. Sergey remains optimistic because he can see how specific puzzle pieces might finally slot together. He acknowledges that in robotics, you often climb a mountain only to find another one standing behind it.

Inspiration often comes from groups that demonstrate the impossible or foster deep experimentation. Boston Dynamics has repeatedly forced people to revise their thoughts on what robots can achieve. Sergey also admires organizations that empower individuals to follow their own intuition. For instance, ChatGPT began as an individual experiment rather than a corporate strategy.

ChatGPT was basically John Shulman's pet experiment for a while. It wasn't a concerted corporate strategy with lots of spreadsheets and pie charts. It was a pet project. I think there's something pretty inspiring about organizations that empower people to have pet projects turn into world changing successes.

Sergey experienced this type of agency personally while working at Google. He discovered a warehouse of unused robots and asked to set them up in a lab for data collection. Despite being a junior researcher, leadership gave him the resources he needed immediately. Giving people that kind of leverage and agency is essential for unlocking creativity.

Career growth is shaped by people who bet on potential

1:04:37 - 1:07:55

Professional kindness often looks like taking a chance on someone. Sergey identifies three key moments where mentors and managers bet on his future potential rather than his past results. One major instance was the ARM Farm project, where leaders like Jeff and Vincent took a risk on him. Another was when he began a postdoc with Peter at Berkeley. At that time, Sergey had no experience in robotics, having worked only in virtual character animation and computer graphics.

I felt like that was a bet on my potential more so than my actual accomplishments.

Early career support can also come from unexpected places, like an internship at Nvidia during his sophomore year. These opportunities are easy to overlook in the moment but become clearly significant in hindsight. Providing these types of openings for others is a way to pay back the kindness received during one's own career development.

In hindsight it is something that made a big difference and hopefully I can make that difference in other people's careers as well.