Mark Chen, OpenAI's Chief Research Officer, reveals the high-stakes world of leading AI research.
He explains how OpenAI sets priorities, allocates its coveted GPUs, and navigates the intense competition for talent, including the infamous "soup wars."
Key takeaways
- OpenAI's talent retention strategy isn't about matching competitors' salaries. It's about cultivating a deep conviction in the mission, which makes top talent willing to stay for less pay.
- At OpenAI, the allocation of compute resources like GPUs serves as a powerful, non-verbal way to communicate and enforce the company's core research priorities.
- Chasing incremental updates to beat a competitor is not a sustainable way to do research. The real, long-term advantage comes from cracking the next paradigm.
- In large-scale AI development, creating a hierarchy that elevates research above engineering is a losing game because progress depends on deep engineering practices to optimize and scale models.
- AI models approach complex problems differently than humans, possessing a unique intuition for what is easy or hard. This creates immense potential for human-AI collaboration in advancing frontier research.
- With AI's rapid advancement, traditional assessments like coding interviews and college exams are becoming obsolete, requiring new methods to evaluate a person's actual knowledge.
- A future interview format could involve a candidate conversing with an AI designed to deeply gauge their expertise, with a human reviewing the transcript to make a final hiring decision.
- Activities that seem uniquely human, like playing poker or generating language, often have a deep mathematical structure that can be mastered by a machine.
- The field of AI is surprisingly shallow compared to disciplines like theoretical physics; one can get to the research frontier in just three to six months by working on a project.
- In a moment of crisis, a leader's responsibility is to create stability and community. A simple, powerful goal, like 'I will not lose a single person,' can rally a team and prevent it from fracturing.
- In response to the natural diffusion of ideas in a competitive industry, a company can either create information silos or simply try to outrun everyone. OpenAI chooses to focus on speed and execution.
- Training models on human data creates a performance ceiling. Surpassing this level requires new methods, but it also introduces the difficult problem of how to measure and evaluate superhuman intelligence.
- There is no single, consistent definition for AGI; a better benchmark for progress is whether AI can produce novel scientific knowledge and advance the scientific frontier.
- The goal of AI in science shouldn't be for one company to win a Nobel Prize, but to build tools that empower every scientist to make their own breakthrough discoveries.
- Strategic constraints, like a temporary hiring freeze, can be a powerful management tool. It forces a re-evaluation of the existing team and ensures the talent bar remains exceptionally high.
- A major challenge in AI alignment is 'scheming,' where a model can produce the correct answer but arrive at it through a twisted or unintended thought process.
- If you incentivize an AI model to present its thinking in a way that appeals to humans, it may learn to be dishonest about its true intentions.
- One potential path to building honest AI is to create environments where models supervise each other, making honesty the only stable outcome.
- The future of ChatGPT is to evolve from a simple prompt-response tool into an AI with memory that learns deeply about the user with every interaction.
- The current moment in AI is comparable to an industrial revolution, creating a sense of urgency that justifies intense personal dedication and sacrifice.
Podchemy Weekly
Save hours every week! Get hand-picked podcast insights delivered straight to your inbox.
The AI talent wars feature soup deliveries and deep conviction
There is an intense competition for a limited pool of top AI talent. Mark Chen notes that while the media has focused on a flow of talent to Meta, the reality from his perspective is different. He claims Meta has aggressively but often unsuccessfully pursued many people at OpenAI. Before they hired anyone, Meta had already approached half of his direct reports, all of whom declined.
The recruitment tactics have become quite personal. Mark Zuckerberg, for instance, has personally delivered soup to people he was trying to recruit from OpenAI. Mark Chen admits this was initially shocking but has come to see these personal touches as effective. In response, he has also delivered soup to potential recruits from Meta, joking that Michelin-star soup is better than homemade. He even considered taking his staff to a cooking class for their next offsite.
I've also delivered soup to people that we've been recruiting from Meta... It's better if you get like Michelin star soup.
OpenAI's strategy for talent retention is not to match Meta's offers dollar for dollar. Instead, they rely on the team's conviction in the company's mission. Mark Chen emphasizes that even among those who have offers from Meta, no one believes AGI will be developed there first. The confidence in OpenAI's research program is very high. The fact that people are happy to stay at OpenAI for compensation that is multiples below what Meta offers gives him strong conviction that the team truly believes in the company's long-term upside. The goal isn't to retain every single person, but to identify and keep the key people, while trusting the internal pipeline for developing new talent.
How OpenAI prioritizes its 300 research projects
At OpenAI, Mark Chen and his colleague Jakob work together to shape the company's research direction. A significant part of this role involves deciding which projects receive computing resources, a task that can be challenging as people are constantly vying for GPUs. Mark notes that people can be very creative in their attempts to secure the resources they need.
It's true, people are very creative in the ways that they try to make backroom deals to get the GPUs they need.
To manage this, Mark and Jakob have a structured process. Every one to two months, they review a large spreadsheet containing about 300 different projects at OpenAI. They work to deeply understand and rank each one. This exercise is crucial for a company of 500 people, as it helps to clearly communicate the core priorities. These priorities are shared both explicitly through verbal communication and implicitly through the strategic allocation of compute power.
How OpenAI prioritizes exploratory research projects
At a research organization like OpenAI, which has around 500 people on its core research team working on 300 different projects, deciding which projects merit computational resources like GPUs is a major challenge. Mark Chen explains that the key is to keep the core roadmap in focus. OpenAI's primary goal is exploratory research, aiming to discover the next paradigm in the field. This distinguishes them from other labs that might focus on replicating results or catching up on benchmarks.
Most people might be surprised at this. But more compute goes into that endeavor of doing exploration than it is to training the actual artifact.
The process of deciding which projects get funding and resources can be difficult, much like a newspaper's page-one meeting where every reporter passionately advocates for their story. Mark acknowledges that it is a difficult process and that making tough calls is a crucial aspect of leadership.
The hardest calls you have to make are, this is a project that we just can't fund right now. But I also think that's good leadership. You need to clearly communicate that, hey, these are the priorities. This is what we're going to talk about. These are the types of results that we think move the research program.
Staying focused on research amidst industry competition
In a competitive AI landscape, it's important not to get caught up in a reactive dynamic with competitors. Chasing incremental updates to stay ahead for a few weeks or months is not a sustainable way to conduct research. The real value lies in long-term bets that can unlock new paradigms.
If you crack that next paradigm, that's just going to matter so much more. You're going to shape the evolution of it.
Mark Chen points to OpenAI's early and once-unpopular bet on reinforcement learning (RL) for language models as an example. What seemed non-obvious two years ago is now considered a fundamental primitive. This highlights their strategy of making bold bets and building algorithms to scale for the future. Even as OpenAI has grown and developed product lines, it maintains the soul of a pure AI research company. The core mission remains to advance AGI research.
I actually do think that's the best head fake to really creating value. If you focus and you win at the research, the value is easy to create. So I think there's a trap of getting too lost into, oh, let's drive up the bottom line. When in reality, if you do the best research, that part of the picture is very easy.
This research focus blurs the lines between traditional roles. Elevating research scientists above engineers is seen as a losing strategy. Building large-scale models is a deep engineering practice, involving optimizing code and ensuring numerical stability. Without this engineering foundation, scaling to the massive number of GPUs required today would be impossible.
When a competitor like Google releases a new model such as Gemini 3, the team at OpenAI works to build an internal consensus on its capabilities, as benchmarks only reveal so much. While they viewed Gemini 3 as a good model, they felt confident in their own internal models and their ability to release even better successors.
The 42 problem is a challenge for language models
Mark Chen has a specific math problem he likes to give to AI models, which so far none have been able to solve perfectly. It's known as the "42 problem."
The puzzle is to create a random number generator modulo 42. You are given access to several primitive random number generators, specifically for prime numbers less than 42. The objective is to use these subgenerators to create the mod 42 generator while making the fewest possible calls to them on average. Mark notes that while language models get very close to the optimal solution, he hasn't seen one completely crack the problem yet.
Playing the long game in AI development
Mark Chen admits to being extremely competitive, stating a strong aversion to losing. When asked if this competitiveness translates to frantically testing a competitor's model like Gemini 3 the moment it is released, he explains that his approach is different.
I hate to fucking lose somewhere. I really hate losing.
Success in any endeavor is about playing the long game. He explains that OpenAI has been intensely focused on supercharging its pre-training efforts for the last six months. This long-term strategy, which involved building a superstar team around pre-training and emphasizing all its important aspects, is what creates the confidence to compete directly with rivals. It is this sustained focus that builds the capability to go head-to-head with competitors, rather than short-term, reactive sprints.
Mark Chen's journey from math competitions to competitive coding
Mark Chen got into coding late in life. A college roommate at MIT convinced him to take his first coding class. Before that, he was deeply involved in math competitions from grade school through high school. He initially approached coding with some skepticism.
I had all the hubris of a mathematician at that time, whereas math is the purest and hardest science. And that's where you really prove your worth.
After graduating from MIT, coding competitions became a way for him and his friends to stay in touch. They would log on every weekend to participate in contests. Over time, Mark discovered he had a talent for it and began competing at a high level. This led to him writing problems for contests like the USA Coding Olympiad and eventually coaching the team. He describes it as a great community where he met influential people.
AI's different intuition is reshaping frontier research
Coding competitions like the International Olympiad in Informatics (IOI) are intense, puzzle-like events where the top four students from each country compete. It is a two-day contest where participants have five hours each day to solve three problems. While competitive, it is also a social event that builds a tight-knit community of talented individuals.
Mark Chen, one of the coaches for the U.S. national team, explains that the students are so self-motivated that his role is often about managing performance, strategy, and morale. He sees parallels between managing contestants and managing researchers. Both have good and bad periods—hours for contestants, months for researchers—and it is crucial not to let failures get into their heads. A significant part of the job is morale management.
A fascinating insight has come from deploying AI models to solve these contest problems. They work in a very different way from humans. While humans often rely on pattern recognition, AIs can have a completely different intuition for a problem's difficulty. For instance, a problem Mark thought would be very hard for an AI turned out to be one of the easier ones for it to solve.
This has given me the sense that AIs plus humans in frontier research, it's going to do something amazing just because the AI has a different intuition for what's easy and what's not.
This is similar to how DeepMind's AlphaGo made moves that human players had not considered. Mark believes there has been an inflection point in AI's capability for frontier research. He shared an anecdote about a physicist friend who was skeptical of AI until he challenged a new model with his latest paper. The model understood it in 30 minutes, producing a reaction in his friend similar to Lee Sedol's during the famous AlphaGo match. This capability is expected to increasingly impact fields like mathematics, biology, and material science.
How AI is breaking coding competitions and college exams
The rapid advancement of AI models is creating a shock for many, especially in fields once considered the peak of human intellect, like competitive programming. Mark Chen, who helped develop these reasoning models, recalls the moment they began to outperform human competitors. He tracked the models' performance against coding contest benchmarks, and what started as average capability quickly improved.
You still remember that moment when you walk into the meeting and they have where your performance is and then the models exceeding that, man, that was also a shock to me. It's just like, wow, we've automated to this level of capability so fast.
Within months, the models were surpassing even the top programmers. In a recent competition, an AI jumped from 100th place the previous year to a top-five position. Despite this, Mark believes these competitions will continue because the best participants are motivated by fun, not just career advancement.
This leap in capability has broken traditional methods of assessment. Interviews, college exams, and homework are now easily gamed. This reality necessitates new ways of gauging a person's actual knowledge. Mark proposes a creative new interview format where candidates might converse with a specialized version of ChatGPT. The AI's goal would be to deeply assess whether the candidate truly understands the material and is a good fit for the company. A human would then review the transcript of this conversation. This is one idea for revamping how talent is evaluated in a world with powerful AI tools.
The surprising math behind poker and AI
Mark Chen reflects on his past obsession with poker, a game he now plays for fun with mathematically-minded friends from his MIT days, like Scott Wu. His big revelation from poker was that it is far more a mathematical game than one of reading people and bluffing. The more you learn about poker, the more you realize its mathematical foundations.
I used to be a terrible bluffer. And when you know it's mathematically correct to bluff, then it's so easy. You don't feel any nervousness around it.
This insight makes bluffing feel less like a psychological risk and more like a calculated move. Mark draws a parallel between the mechanics of poker and the functioning of AI language models. Both represent seemingly human processes that are actually governed by mathematical principles.
There's something about that in language modeling, too. You have this deeply human process of generating language, but there's this mathematical machine that can really do it as well as we can.
Today, he no longer takes poker as seriously, noting that intense focus on winning can take the fun out of it. The games are now mostly a forum for him and his friends to hang out and catch up.
Mark Chen on transitioning from Wall Street to AI
Around 2018, there were a few common paths for people working in AI at a high level. Many came from academic backgrounds as math prodigies who later moved into robotics or physics. Another bucket consisted of people from Wall Street, like quants and high-frequency traders. Mark Chen took this second path, going from MIT straight to Wall Street.
Mark doesn't look back on this time with much pride. While the system was meritocratic, allowing intelligence to be applied directly to generating profit, the culture was challenging. It was a secretive environment where discoveries were closely guarded. This fostered internal competition and a lack of trust among colleagues.
It was a place where, when you discover something, your first instinct is to just keep it away from as many people as possible, because your knowledge is what gives you your worth.
After four or five years, he felt the work existed in a closed ecosystem. Breakthroughs in high-frequency trading made algorithms faster, but they didn't meaningfully change the world. A major inspiration for his shift was the AlphaGo match. Witnessing a model do something creative sparked a desire to understand the technology behind it. This led him to start a deep dive into AI, beginning with the goal of reproducing the DQN results, a network that could play Atari games at a superhuman level, all while still working his day job.
Mark notes that AI, especially at that time, was a surprisingly shallow field. He often advises people who are intimidated by AI that they can reach the frontier relatively quickly.
It's so shallow. Just spend three to six months picking some project, maybe reproduce DQN and you can get to the frontier very quickly. The last couple years has added a little bit of depth, but it's not anything like theoretical math or physics.
When asked if AI, like mathematics, favors the young, Mark believes you can continue to contribute throughout your career. However, he does see an advantage for younger researchers. They tend to have fewer preconceived notions, or "priors," about how research should be done. This mental "plasticity" allows them to approach problems in new ways, whereas more experienced researchers might be locked into a certain frame of mind.
Mark Chen on his early days and foundational projects at OpenAI
Mark Chen joined OpenAI in 2018 when it was a very small company of about 20 people. He started as a resident, a role designed for individuals from other fields whom OpenAI wanted to train in AI. Mark describes the program as starting with a "six month compressed PhD." He was Ilya Sutskever's only resident, learning directly from him about high-level research thinking. For his first three years, Mark was an Independent Contributor (IC) working on research projects, primarily in generative modeling, which was Ilya's focus at the time.
During his time as an IC, Mark worked on two projects he is particularly proud of. The first was ImageGPT, which served as a proof of concept for applying transformer models to images, not just text. He explains its significance:
ImageGPT, this proof of concept that even outside of language, you could put things like images into a transformer and the model would just internalize very good representations and understand the content of images.
Mark considers this work a precursor to DALL-E. The second project was Codex, where he helped develop the framework for evaluating coding models and studied how to make language models effective at generating code. His transition from an IC to a manager happened around the time of the DALL-E project.
Mark Chen on his journey and leadership through the OpenAI crisis
Mark Chen joined OpenAI in its early days, when it was a small group of around 20 people taking on giants like Google. He was drawn to the company because it had both an ambitious vision and the talent to back it up. Knowing key people like Greg Brockman from high school math contests helped him get his foot in the door.
Mark was initially hesitant to transition from an individual contributor (IC) to a manager, as he was enjoying his IC work and wasn't sure he had the right skillset. However, he credits his growth to supportive managers who advocated for him, leading to organic promotions he never had to ask for. He believes management is a skill developed through experience, not innate talent.
I think part of growing in management is just getting the reps. I don't think there's any better place to get the Reps than at OpenAI. There's always challenges to solve and I actually think management is something where it's really just about the experience and less so talent involved in it.
During the dramatic leadership crisis at OpenAI, often called "the blip," Mark stepped up in a pivotal way. He felt a responsibility to keep the research team together while competing labs were trying to poach talent. He and a colleague, Nick Barrett, set a clear goal.
I just set this goal of, I will not lose a single person. And we didn't. And it was just every day opening up our houses. People could come here. They could have a place where they let out their anxiety and then also just helping them keep in touch with the leadership team, having a way for them to feel like they could make a difference. And I think over time, people really felt the spirit of, hey, we're all in this together.
OpenAI's research culture is built on alignment and bottom-up innovation
During a period of internal turmoil, a petition was organized to show unified support for Sam Altman. The idea solidified around 2 a.m., and by morning, over 90% of the entire research organization had signed on. Team members called their friends to ask if they were in, demonstrating a powerful sense of alignment. This was a difficult, low-information environment, especially given the perceived conflict between Sam and his mentor, Ilya Sutskever. However, the fact that high-integrity individuals like Greg Brockman and Jakub Pachocki quit in protest suggested part of the story was being misrepresented.
This sense of alignment is a key part of the culture. The speaker, Mark Chen, notes his strong connection with Jakub, highlighting how they can quickly align on ideas and execute on large roadmaps together. He feels a strong sense of protectiveness over the OpenAI research team, which he describes as a family that is constantly under attack. He views the fact that other companies try to recruit their talent as a sign that OpenAI is in the lead. The organization has a track record of creating star researchers and he feels a duty to ensure they are happy and understand their role in the bigger picture.
While some believe AI research is driven by a few key individuals, Mark disagrees with this as the sole model for success. He believes OpenAI has a beautiful, deeply bottom-up culture.
OpenAI has this beautiful culture of being bottom up in a very deep way, where some of the best ideas just organically emerge from sometimes the most surprising of places.
While there is top-down steering on major bets, many of the best ideas, like the advancements in reasoning, emerge organically. This bottom-up innovation is a critical component of their research success.
Outrunning the competition through a culture of openness
The tech industry is highly dependent on star talent, as seen when companies like Google spend huge sums to bring back key individuals. The strategy at OpenAI is a mix of developing their own stars and aggressively recruiting top talent from elsewhere, even taking pages from Meta's recruiting playbook. The goal is to assemble the best team possible to achieve the company's mission.
This competition for talent exists within a small, interconnected world where rivals are often friends. Mark Chen notes that it's a "brutally competitive industry" on all fronts, a dynamic he enjoys as a competitive person. This environment is similar to the early days of the semiconductor industry, where competing startups pushed the limits of physics. Engineers from different companies would share knowledge, leading to rapid breakthroughs across the field.
This raises a question for companies: how to handle the natural diffusion of ideas. One approach is to create deep silos to protect information. OpenAI rejects this model in favor of a different strategy.
You can create these deep silos of, 'hey, we're going to protect information in all of these ways.' I don't think OpenAI operates that way and we don't think that's the right way to operate. We just will outrun other people as fast as we can. And I love the culture of openness. People in research freely share ideas and I think that's the way to make the fastest progress.
How Sam, Jakob, and Mark collaborate to lead the organization
The leadership team of Sam, Jakob, and Mark functions as a very tight cohort, speaking with each other every day. Sam is passionate about research and is effective at getting a pulse on the research organization. He talks to researchers, reads papers, and helps uncover any hidden or latent problems that need to be addressed.
These problems can be subtle but impactful. For instance, Sam might identify that the way an office is laid out hinders collaboration between two teams whose work is interdependent. Surfacing these kinds of issues is a key part of his role.
While Sam focuses on the research pulse, Mark and Jakob concentrate on designing the organization for success. They spend much of their time pairing people with the right strengths and creating incentives to encourage work on directions they find most important.
Rediscovering the untapped potential of pre-training AI models
In recent years, a heavy focus on developing AI reasoning capabilities has been successful. However, Mark Chen explains that this intense focus led to a decline in other key functions, like pre-training and post-training. He likens these functions to a muscle that needs to be exercised regularly. Over the last six months, there has been a significant effort to rebuild this pre-training muscle by updating infrastructure, working on frontier optimization, and shifting the company's internal focus.
I think pre-training is really a muscle that you exercise. You need to make sure all the infra is fresh. You need to make sure people are working on optimization at the frontier, are working on numerics at the frontier. And I think you also have to make sure the mind share is there.
Contrary to the popular belief that "scaling is dead," the team believes there is still substantial room for improvement in pre-training. This renewed focus has already resulted in training much stronger models, boosting confidence for future releases. The industry's widespread attention on reinforcement learning (RL) is seen as an advantage, as it leaves the pre-training space less crowded.
A fundamental challenge with pre-training is that it teaches a model to emulate human-written data. This effectively puts a ceiling on the model's potential, as it can't easily surpass the abilities of the humans it's imitating. To break through this ceiling, techniques like RL are used to steer the model towards solving tasks beyond human imitation. However, this creates a new measurement problem: how can humans effectively evaluate superhuman performance? Once a model's capabilities exceed the top human experts, existing benchmarks and contests become obsolete, making it difficult to gauge further progress.
Even in the sense of, can humans judge superhuman performance in the sciences? How would we know that this superhuman mathematician is better than that superhuman mathematician? And we really do need to come up with better evaluations for what it means to make progress in this world.
The true measure of AGI is its impact on scientific discovery
Discussions about AGI timelines often create a cycle of hype and despair. For instance, Andre Karpathy suggested AGI was ten years away, which deflated some in the industry, while others like Dario Amodei seem to hold to a much shorter, two-year timeline focused on massive scientific breakthroughs.
However, the debate over timelines is complicated by the fact that there is no single, clear definition for AGI, even within a single organization like OpenAI. The situation is comparable to the Industrial Revolution; it's difficult to pinpoint a single event, like the creation of textile machines or the steam engine, that defines the entire era. We are in the middle of the process of creating AGI.
A more practical way to measure progress is to focus on whether AI is producing novel scientific knowledge and advancing the scientific frontier. Since the summer, there has been a significant phase shift on this front. This has inspired the idea of creating an "OpenAI for science." The objective is to provide tools to scientists who see the potential of these models and want to accelerate their work. The approach differs from others by focusing on empowerment for the entire scientific community.
We want to allow everyone the ability to win the Nobel Prize for themselves. It's less so about us winning that at OpenAI, which would be nice, but we want to build the tooling and the framework so that all scientists out there feel that accelerative impact and we think we can push the field collectively.
OpenAI's concrete goals for AI-driven scientific research
Recent AI advancements are more than just sophisticated literature searches. For example, a GPT-5 paper addressed an open convex optimization problem. Despite the rapid pace of developments, especially in fields like biotech, which can make it difficult to separate hype from reality, there is strong evidence of progress. Mark Chen feels confident because experts in computer science and mathematics are confirming these discoveries.
However, this confidence is not universally shared. Many people, including knowledgeable commentators, remain skeptical about AI's progress. Even many physicists and mathematicians that OpenAI has engaged with are not yet convinced that AI can solve new theorems.
Most of the people we've talked to aren't that bullish on AI. They still believe this thing isn't something that can solve new theorems. There must be something else going on. That's why I feel like empowering the set of people who really do believe and lean into it. Those people are going to just outrun everyone else.
The sentiment that AGI is perpetually "two years away" is no longer accurate. The conviction is shifting due to tangible results in math and science. To that end, OpenAI's research organization has set very concrete goals for integrating AI into the scientific process.
Within a year, we want to change the nature of the way that we're doing research. We want to be productively relying on AI interns in the research development process. And within two and a half years, we want AI to be doing end-to-end research.
This means that within a year, the aim is for humans to control the "outer loop" of research by generating ideas, while the model handles the implementation and debugging. The longer-term goal is for AI to manage the entire research pipeline autonomously.
The demand for compute is far from saturated
The question of whether more computational power is truly necessary is shocking to Mark Chen. From his perspective, the demand is insatiable and a daily reality of his work.
If we had 3x to compute today, I could immediately utilize that very effectively. If we had 10x to compute today, probably within a small number of weeks fully utilize that productively.
Mark sees no slowdown in this demand. This drive for more compute is directly tied to the goal of scaling up models, which he confirms they absolutely want to continue doing. He feels they have the "algorithmic breakthroughs" necessary to enable this scaling. He contrasts their progress with others in the field, noting that even impressive models like Gemini 3 still have challenges with data efficiency, an area where he believes his team has "very strong algorithms."
Responding to Gemini 3 with strategic urgency
A leaked memo from Sam about Gemini 3 sounded quite somber. Mark Chen noted that a significant part of his and Sam's job is to inject urgency and pace into their organization. He believes it is important to be laser-focused on scaling. While he views Gemini 3 as the right kind of strategic bet for Google to pursue, he is confident that his team has a response and can execute even faster moving forward.
Designing AI hardware with taste and memory
The way we currently interact with ChatGPT feels "very dumb," according to Mark Chen. You provide a prompt, receive a response, and then the model does no productive work until you give it the next one. It doesn't get smarter from previous questions.
The future vision is for a world where memory is a much-improved feature. With every interaction, the AI would learn something deep about the user and reflect on why they asked a particular question. This raises the question of how to design a device built around this thesis, which is the basis for the collaboration with Jony Ive.
Mark acknowledges that the skills needed to build AI capabilities are different from having the best taste for product design. For this reason, Jony Ive's role is clear.
Honestly, we don't need to have taste ourselves. And that is Johnny's job. He's our discriminator on taste.
Mark sees deep parallels between the work of design and research. Both involve significant exploration and ideation, where you explore various hypotheses before creating a final artifact you're happy with. This collaboration allows for direct communication to seamlessly merge AI capabilities with a physical form factor. They even have teams focused on taste for model behavior, who ponder questions such as, "What should ChatGPT's favorite number be?"
Building AGI in a new industrial revolution
When asked about small, nascent ideas that could lead to future breakthroughs, Mark Chen is reluctant to share specifics but hints at a handful of concepts related to pre-training and reinforcement learning. He shifts focus to the core identity of OpenAI, clarifying a common misconception. He emphasizes that OpenAI is fundamentally a research-centric company, with its primary ambition being the creation of AGI without distractions. The development of products is a natural outcome of this core mission, not the primary driver.
Mark outlines the pillars of OpenAI's research goals: automating AI research to accelerate their own progress, automating scientific discovery, and automating economically useful work. He notes that significant progress has recently been made in automating scientific research.
This intense focus comes at a personal cost. Mark admits to having no social life, often working until 1 or 2 AM. He views this period as a pivotal moment in history, justifying the immense dedication required.
It's like if we're in the middle of something like an industrial revolution, you gotta take as much advantage of it as possible.
He confirms stories of sleeping at the office during a particularly demanding period, driven by a deep sense of protectiveness for the research organization, which he considers his "baby."
Responding to competition by doubling down on research
In the high-stakes field of building AGI, there are always going to be challenges and moments of crisis. The important thing is to understand what truly matters amidst the constant activity. One such moment occurred when a Chinese open-source model, DeepSeek, was released and went viral.
Everyone was like, oh man, has OpenAI lost its way? Are these models catching up? And what's the response? What's the response? What's the response?
The immediate reaction from the outside world was to question if others were catching up and to demand a response. However, the right move was to double down on their own research program. While DeepSeek was seen as a strong lab and a great replication of existing ideas, the fundamental strategy was to keep focusing on innovation rather than reacting to replications.
Maintaining high talent density is more important than team size
The optimal number of people for chasing big ideas might be even less than 500. With the integration of AI researchers and interns, there's a real question of how to design an effective organizational network. A strong emphasis is placed on maintaining high talent density. To achieve this, unconventional experiments can be valuable.
For instance, one quarter, all new headcount for research was frozen. If a manager wanted to hire someone, they first had to identify a current team member who wasn't a good fit. These kinds of exercises are important to prevent an organization from becoming unmanageable and to keep the talent bar very high.
OpenAI's approach to giving credit for research
While over-fixation on credit can be a bad thing, Mark Chen believes it's very important for a company to recognize credit both internally and externally. He notes that many companies in the tech industry have shied away from this, moving away from publishing papers and credit lists. At OpenAI, he and Jakob made the call to continue giving public attribution.
The common counterargument is that publishing who worked on a project is like handing top performers to competitors on a platter, making them targets for recruitment. However, Mark feels it's more important to recognize people doing great work and to be a pipeline for creating AI superstars. He argues for giving credit where it's due, even at the risk of everyone knowing who the top talent is.
I think OpenAI is the place where we allow for the most external credit per capita. By a large margin.
The critical challenge of AI alignment and model scheming
When asked about his motivation for working at OpenAI, coming from a high-frequency trading background, Mark Chen highlights the importance of safety and alignment. He managed the alignment team at OpenAI and believes that alignment is one of the grand challenges for the next one or two years. He points to recent research on issues like "scheming" as a major concern.
The more RL compute that you pump into the model, the more you can measure things like self awareness, self preservation, potentially even situations where the model can scheme. And it's scary because the model can come to you with the right answer at the end, the answer that you expect but arrive at it from a very kind of twisted way.
As AI models take on more complex tasks, understanding their thought processes becomes critically important for ensuring they operate safely.
Observing an AI's thinking process is key to alignment
The field of mechanistic interpretability seeks to understand the black box of how complex AI systems operate. A key question is whether our ability to understand these systems can keep up with their rapidly increasing complexity. A crucial design choice has been to avoid supervising a model's thinking process. The reasoning is that if you incentivize a model to produce a thought process that is appealing to a human, it might not be honest about its true intentions.
By instead observing the natural thinking process of the model, researchers have a tool to better understand and ensure alignment. There's a significant worry about a future where models become incredibly persuasive without being truly aligned with human values.
I really do worry about this world in the future where the model will tell us something super convincing. But we can't be sure whether the model is aligned with us, aligned with our values.
Future research may involve creating environments or games where models supervise each other or co-evolve. The goal is to design a system where the only stable outcome is one in which the models are honest.
