Podcast Notes /// The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI) | Lenny's Podcast: Product | Career

Edwin Chen is the founder of Surge AI, the company that teaches AI what is good versus what is bad for labs like Google and Anthropic.

He explains that the values we instill in these models are critical, shaping whether they will learn to chase dopamine or help us pursue truth and creativity.

Key takeaways

Instead of building AI to solve humanity's biggest problems, we are at risk of optimizing models for 'AI slop' by teaching them to chase dopamine instead of truth.
Training an AI is like raising a child. Focusing on a simple metric like passing a test is easy, but the real goal is to define and cultivate the complex qualities of the person, or AI, you want them to become.
The metrics we choose define the systems we build. Optimizing for easy proxies like clicks and likes can create AI that makes us lazier, whereas focusing on harder, richer objectives can create tools that make us more curious and creative.
Contrary to earlier beliefs, AI models are not becoming commodities. They will become increasingly differentiated based on the values and goals of the labs that create them.
A key design choice for AI models will be whether to optimize for user engagement or user productivity. One might help you perfect a task endlessly, while another might tell you it's 'good enough' so you can move on.
Models can achieve impressive feats like winning math olympiad medals but still fail at simple tasks like parsing a PDF because objective benchmarks are easier to optimize for than real-world challenges.
Large Language Models (LLMs) might be a dead end or plateau on the path to AGI because they don't learn in the same diverse ways that humans do.
High-quality data for AI isn't about meeting basic criteria, like checking boxes for a poem's length. It's about achieving subjective, complex qualities like uniqueness, emotional depth, and insightful imagery.
An AI model's superiority comes not just from data, but from the "taste and sophistication" of the team training it, which influences countless choices in data selection and optimization.
Progress in AI isn't linear. The effort to get from 90% to 99% performance is far greater than getting from 80% to 90%, extending the timeline for true AGI.
It is not enough for an AI model to arrive at the correct answer; the 'trajectory' or path it takes is crucial for teaching it efficient and logical problem-solving methods.
Optimizing AI for user engagement, much like social media, can lead to negative outcomes where models flatter users and reinforce their biases to maximize interaction time.
By avoiding the VC and PR 'hamster wheel,' founders can focus on building a superior product that attracts mission-aligned customers who value substance over hype.
Instead of constantly pivoting, focus on building the one thing that only you can build, based on your unique insight and expertise, even if it means failing while trying something truly novel.
To truly measure an AI model's progress, expert human evaluation is crucial. Unlike casual users who might pick the 'flashiest' answer, experts deeply verify accuracy, check code, and assess instruction following in their specific domains.
Humans will remain essential for training and evaluating AI until AGI is reached. By definition, if a model isn't yet at AGI, there is still more it can learn from human expertise.
Building a company like a research lab, focused on intellectual rigor and long-term incentives over quarterly metrics, can foster a unique culture aimed at solving fundamental problems.

AI is being taught to chase dopamine instead of truth

00:00 - 01:33

Edwin Chen's company, Surge, reached a billion dollars in revenue in less than four years with only 60 to 70 people, all while being completely bootstrapped. This success was built on a philosophy of rejecting the typical Silicon Valley model. Having worked at big tech companies, Edwin felt they were inefficient and bloated.

I always felt that we could fire 90% of people and we would move faster because the best people would have all these distractions. So when we started Surge, we wanted to build it completely differently with a super small, super elite team.

As a data company, Surge teaches AI models the difference between good and bad outputs. Edwin notes that the values of the companies building AI will ultimately shape the models. He shares a personal story of spending 30 minutes perfecting an email with help from the AI model Claude, only to realize the task itself was unimportant. This raises a crucial question about what ideal AI behavior should be.

Do you want a model that says, 'You're absolutely right, there are definitely 20 more ways to improve this email,' and it continues for 50 more iterations? Or do you want a model that's optimizing for your time and productivity and just says, 'No, you need to stop. Your email's great, just send it and move on.'

This leads to his concern that many AI labs are pushing development in the wrong direction. Instead of creating AI that could help cure cancer or solve poverty, the focus has shifted to optimizing for trivial engagement, which he calls "AI slop."

We are optimizing our models for the types of people who buy tabloids at the grocery store. We're basically teaching our models to chase dopamine instead of truth.

Building a billion-dollar company with fewer than 100 people

04:56 - 09:36

Surge hit over a billion dollars in revenue with under 100 people, a feat achieved by being completely bootstrapped. Edwin Chen believes we will see companies with even more extreme ratios in the coming years, potentially reaching 100 billion per employee. This is driven by AI's increasing efficiency. Having previously worked at big tech companies, Edwin always felt that 90% of the people could be fired to move faster, as the best people were often distracted. This insight shaped Surge's philosophy from the beginning, aiming for a super small, elite team.

Two major trends are colliding: the realization that giant organizations aren't necessary to win, and the massive efficiencies gained from AI. This will lead to a new era of company building. The types of companies will change fundamentally. Fewer employees mean less capital is needed, so founders won't need to raise venture capital. This will shift the focus from founders who are great at pitching to those who are great at technology and product. Instead of optimizing for VC metrics, these small, obsessed teams can build products they truly care about, fostering real innovation.

What did you dream of doing when you were a kid? Was it building a company from scratch yourself and getting in the weeds of your code and your product every day? Or was it explaining all your decisions to VCs and getting on this giant PR and fundraising hamster wheel?

Surge intentionally avoided the typical Silicon Valley game of self-promotion on platforms like Twitter and LinkedIn. While this made things more difficult initially—lacking the media attention that comes with fundraising—it had a crucial benefit. Their only path to success was building a product that was ten times better and relying on word-of-mouth. This attracted early customers who deeply understood and cared about high-quality data. This alignment with customers who valued the product's substance over its hype was instrumental in their early development. In essence, Surge is a data company that teaches AI models what is good and bad, training them using human data.

The search for Nobel prize-winning AI data

09:36 - 13:30

Most people misunderstand what quality means for AI data, assuming you can just throw more people at a problem to get good results. Edwin Chen explains this is completely wrong. He uses an analogy: imagine training a model to write a poem about the moon. A superficial approach to quality would be to check if the output is a poem, has eight lines, and contains the word "moon." This is a low bar.

What we are looking for is Nobel prize-winning poetry. Is this poetry unique? Is it full of subtle imagery? Does it surprise you and tug at your heart? Does it teach you something about the nature of moonlight? Does it play with your motions and does it make you think? That's what we are thinking about when we think about a high-quality poem.

Achieving this level of quality is difficult because it's subjective, complex, and hard to measure. To tackle this, they built technology to gather thousands of signals on every worker, project, and task. These signals track everything from keyboard strokes and response speed to reviews and whether the data they produce actually improves the AI models. This data helps identify who is good at writing poetry versus technical documentation, for example.

This process is similar to how Google Search ranks webpages. First, it removes the worst content, like spam and low-quality sites, which is like a content moderation problem. Then, it works to discover the 'best of the best' content. In the same way, the system aims to find contributors who can produce work that is not just technically correct but also emotionally resonant and insightful, rather than just robotically checking boxes. It's a complex machine learning problem designed to find and leverage the highest quality human input.

The role of taste and sophistication in training AI models

13:31 - 17:37

For a long time, the AI model Claude was significantly better at coding and writing than its competitors. The reason for this superiority is complex, but a large part of it is the data used for training. AI labs face an almost infinite number of choices when selecting data. These choices include whether to use human-generated or synthetic data, how to gather it, and what specific qualities to prioritize. For instance, in coding, a team might care more about front-end visual design or back-end efficiency.

Labs also face trade-offs. Some may choose to optimize for academic benchmarks for marketing purposes, even if they don't believe those benchmarks reflect real-world performance. Others might be more principled, focusing solely on how the model performs on practical tasks. This post-training process is described as more of an art than a science, heavily influenced by human judgment.

There's this notion of taste and sophistication. When you are deciding what kind of model you're trying to create and what it's good at.

The personal "taste" of the team leaders informs what data they seek and what objective function they optimize for. This human element is a surprisingly key factor in the success of AI models. For example, when defining a good poem, some models might robotically check off a list of instructions. However, the labs with more taste and sophistication recognize that quality doesn't reduce to a fixed set of checkboxes; they account for implicit, subtle qualities, which ultimately makes their models better.

AI benchmarks don't reflect real-world intelligence

17:37 - 20:09

Despite AI models constantly outperforming humans on various benchmarks, it doesn't always feel like the models are getting that much smarter to the average person. Edwin Chen explains that he doesn't trust these benchmarks for two main reasons.

First, many people, including researchers, don't realize that the benchmarks themselves are often flawed, containing wrong answers and general messiness. Second, benchmarks typically have well-defined, objective answers. This structure makes it easy for models to improve on them, a process known as "hill climbing," which is very different from solving the messy and ambiguous problems found in the real world.

It's kind of crazy that these models can win IMO gold medals, but they still have trouble parsing PDFs. And that's because, even though IMO gold medals seem hard to the average person, they have this notion of objectivity that parsing a PDF sometimes doesn't have.

Achieving high benchmark scores can be seen as a marketing tool. Frontier labs can game these benchmarks in various ways. They might tweak system prompts or the number of times a model is run to boost scores. More fundamentally, by simply optimizing for a benchmark instead of for real-world application, a model will naturally improve its score on that specific test, which is another form of gaming the system.

How expert human evaluation measures AI progress

20:09 - 22:17

The most effective way to measure an AI model's progress toward AGI is through human evaluations. This involves having expert human annotators engage in deep, conversational interactions with the model across various specialized topics. For example, a Nobel Prize-winning physicist might discuss their research, a teacher might create lesson plans, or a coder at a major tech company might work through daily problems.

These experts are not just casually interacting; they are deeply scrutinizing the model's responses. They evaluate the code it writes, double-check the physics equations, and assess the outputs for accuracy and how well they follow instructions. This is a significant departure from the feedback gathered from casual users, who often provide more superficial evaluations.

When you suddenly get a pop up on your ChatGPT response asking you to compare these two different responses, people like that, they're not evaluating models deeply, they're just vibing and picking whatever response looks flashiest.

According to Edwin Chen, this in-depth, expert-led evaluation is a much better approach than relying on automated benchmarks or random online A/B tests. He believes humans will remain central to this process for the foreseeable future. By definition, if AGI has not been reached, there is still more for the models to learn from people, making human input essential.

Why AGI is likely decades away

22:17 - 23:12

Edwin Chen believes AGI is on a longer time horizon, likely a decade or decades away. He explains that people often underestimate the difficulty of progressing from high performance to near-perfect performance. The gap between 80%, 90%, 99%, and 99.9% capability is vast and requires exponentially more effort for each step.

Using a software engineer as an example, he predicts that AI models might automate 80% of an average L6 engineer's job within the next two years. However, moving to 90% automation will take another few years, and getting to 99% will take even longer. This illustrates the long and difficult path toward full automation and, ultimately, AGI.

AI is being optimized for slop instead of truth

23:12 - 28:34

There is a growing concern that instead of building AI to advance humanity by curing diseases or solving poverty, the industry is optimizing for "AI slop." Models are being taught to chase dopamine instead of truth, a problem driven by flawed benchmarks and incentives.

A key example is the prevalence of leaderboards like LM Arena, where anonymous users vote on which AI response is better. These users typically skim the responses for only a few seconds, favoring whatever looks flashiest. This means a model can completely hallucinate but still rank highly if it uses eye-catching emojis, bolding, and long-winded answers. This creates a perverse incentive for AI developers.

It's literally optimizing your models for the types of people who buy tabloids at the grocery store. We've seen this in their data ourselves. The easiest way to climb Alamarina, it's adding crazy boating, it's doubling the number of emojis, it's tripling the length of your model responses, even if your model starts hallucinating and getting the answer completely wrong.

This issue is compounded because major AI labs feel pressured to perform well on these leaderboards. Their sales teams face questions from enterprise customers about why their model isn't ranked higher. As a result, researchers are pushed to prioritize climbing these leaderboards to get promoted, even when they know it makes their models fundamentally worse in accuracy.

Another worrying trend is optimizing AI for engagement, similar to what happened with social media. When social media platforms optimized for engagement, feeds filled with clickbait, bikinis, and conspiracy theories. The same is happening with AI. To hook users, models are designed to be flattering, telling users they are geniuses and feeding into their delusions. This maximizes time spent but steers AI away from being a tool for truth.

The easiest way to hook users is to tell them how amazing they are. And so these models, they constantly tell you you're a genius. They'll feed into delusions and conspiracy theories. They'll pull you down these rabbit holes because Silicon Valley loves maximizing time spent.

Edwin Chen notes that some companies are more principled. He points to Anthropic as a company that takes a very principled view of how their models should behave. Ultimately, the path AI labs take matters. The decision to build a product like Sora, for example, reveals a company's underlying values and the future it wants to create. Taking shortcuts or using questionable methods can harm the long-term goal, even if it provides short-term gains.

Rejecting the Silicon Valley playbook for mission-driven building

28:34 - 31:50

The standard Silicon Valley playbook is often counterproductive to building important companies. Many popular mantras are worth questioning, such as pivoting every two weeks to find product-market fit, chasing growth and engagement with dark patterns, and blitzscaling by hiring as fast as possible. This approach leads to founders chasing trends, moving from crypto to NFTs and now to AI, without a consistent mission. They are simply chasing valuations, which is hypocritical for a culture that often criticizes Wall Street for focusing only on money.

A better path is to focus on a singular mission. Edwin Chen suggests that founders should resist the urge to follow the crowd. Instead, they should build something that reflects their unique perspective. This means saying no to distractions and not pivoting when things get hard. The goal is to build the one company that would not exist without your specific insight and expertise.

Just build the one thing only you could build, the thing that wouldn't exist without the insight and expertise that only you have.

Startups are meant to be a way of taking big risks to build something you truly believe in. Constantly pivoting is not risk-taking; it is an attempt to make a quick profit. It is better to fail because the market is not yet ready for a deep, novel idea than to pivot into another generic company, like another LLM wrapper. The most successful generational companies, such as OpenAI and Stripe, were founded on wild ambition and a core belief, not on a random search for product-market fit. The future of technology may depend on more people rejecting the grift and choosing to work on big things that matter.

Why something new is needed beyond LLMs to reach AGI

32:59 - 34:31

Edwin Chen believes that something new will be needed beyond Large Language Models (LLMs) to reach Artificial General Intelligence (AGI). He approaches the problem of training AI from a perspective inspired by biology. Just as humans have a million different ways to learn, AI models should be built to mimic all those various learning methods.

The goal is to replicate the learning abilities of humans, ensuring the algorithms and data exist for models to learn in similar ways. To the extent that LLMs learn differently from humans, they represent a limitation. This gap suggests that a new breakthrough or approach will be necessary to move beyond the current plateau and achieve more advanced AI capabilities.

Training AI in simulated worlds with reinforcement learning

34:31 - 41:11

Reinforcement learning (RL) is a method for training a model to achieve a specific reward. This training happens within what are called RL environments, which are essentially detailed simulations of the real world. Edwin Chen describes it as building a video game with a fully fleshed-out universe. For example, you could create a simulated startup environment with Gmail messages, Slack threads, JIRA tickets, and a code base. Then, you could introduce a problem, like AWS or Slack going down, and task the model with figuring out what to do.

These complex environments are crucial because they expose the weaknesses of models in performing end-to-end tasks. While models might seem smart on isolated, single-step benchmarks, they often fail catastrophically when placed in messy, real-world scenarios. In these simulations, they must interact with unfamiliar tools and make decisions where an action in step one affects the outcome at step 50.

This represents the next stage in model training, complementing earlier methods like SFT and RHF. The role of the human expert shifts from writing rubrics to designing these complex RL environments. A financial analyst, for instance, might create a simulation with a spreadsheet and tools like a Bloomberg Terminal, setting the reward as the model correctly calculating a profit and loss number in a specific cell.

This process is much closer to how humans learn: by trying things and seeing what works. A critical aspect of this is analyzing the model's 'trajectory'—the sequence of steps it takes to reach a goal. It's not enough for the model to get the right answer; it needs to get there efficiently and logically.

Sometimes even though the model reaches the correct answer, it does so in all these crazy ways. It may have in the intermediate directory, it may have tried 50 different times and failed, but eventually it just kind of randomly lands on a correct number. Paying attention to the directory is actually really, really important because some of these trajectories can be very, very long. If all you're doing is checking whether or not the model reaches the final answer, there's all this information about how the model behaved in the immediate step that's missing.

By examining the entire path, trainers can teach the model the best way to solve a problem, preventing it from learning inefficient or random methods that happen to produce a correct result.

AI training methods have evolved to mirror human learning

41:11 - 44:38

The methods for training AI models have evolved over time, with each step mirroring different aspects of human learning. Edwin Chen explains this progression using helpful analogies. The process began with Supervised Fine-Tuning (SFT), which is like mimicking a master and copying what they do. This was followed by Reinforcement Learning from Human Feedback (RLHF), a method that became very dominant. The analogy for RLHF is like writing 55 different essays and having someone tell you which one they like the most.

More recently, rubrics and verifiers have become very important. This is like being graded and receiving detailed feedback on where you went wrong. This method is also referred to as 'evals'. Evals serve two main purposes. One is for training, where the model is rewarded when it does a good job. The other is for measurement, where different model versions are evaluated to decide which one is best to release publicly. The current frontier is now RL environments.

Ultimately, AI will need a wide range of learning methods, just as humans do. This reflects the complex ways people acquire skills, like becoming a great writer.

You don't become great by memorizing a bunch of grammar rules. You become great by reading great books, and you practice writing and you get feedback from your teachers... You learn through this endless cycle of practicing reflection... just in the same way that there's a thousand different ways that the great writer becomes great, I think there's going to be a thousand different ways that AI models need to learn.

This suggests the journey to make AI smarter is about progressively getting closer to how humans learn. The end goal might be to place an AI in a rich environment and let it evolve, using many different sub-learning mechanisms along the way.

Surge's unique two-pronged research team

44:39 - 48:06

Surge has its own research team, which is a unique investment stemming from Edwin Chen's background as a researcher. The goal is to fundamentally push the industry and research community forward, not just to focus on revenue. The research team is structured in two parts.

First, there are forward-deployed researchers who work hand-in-hand with customers. They help clients understand their models, identify where they lag behind competitors, and design datasets, evaluation methods, and training techniques to improve them. This is a highly collaborative effort to help customers become the best.

Second, there are internal researchers who focus on foundational issues. A major focus is building better benchmarks and leaderboards, as Edwin worries that current ones are steering models in the wrong direction. This team also trains its own models to determine what types of data and which people perform best, thereby improving Surge's internal data operations and products.

This research-first approach is central to the company's identity. Edwin explains that this focus has always been his primary driver.

I've always said I would rather be Terence Tao than Warren Buffett. So that notion of creating research that pushes the frontier forward and not just getting some evaluation, that's always been what drives me.

When hiring, Surge looks for people who are fundamentally interested in spending their day with data. The ideal candidate can spend hours digging through a dataset, analyzing where a model is failing, and thinking about its desired behavior. They value hands-on work and a focus on the qualitative aspects of models, not just abstract algorithms.

AI models will develop unique personalities based on their creators' values

48:07 - 50:52

In the next few years, AI models will become increasingly differentiated, not commoditized. A year ago, the assumption was that all AI models would essentially become the same. One might be slightly better today, but others would quickly catch up. However, it's now clear that the values and objective functions of the companies creating them will shape the models' personalities and behaviors.

Edwin Chen shares an example of asking Claude to help draft an email. After 30 minutes and 30 different versions, he had the perfect email. But he realized he spent half an hour on a task that didn't matter. This highlights a deep question about what constitutes ideal model behavior.

Do you want a model that says, 'You're absolutely right, there are definitely 20 more ways to improve this email' and it continues for 50 more iterations and it sucks up all your time and engagement? Or do you want a model that's optimizing for your time and productivity and just says, 'No, you need to stop. Your email's great, just send it and move on with your day.'

This choice represents a fork in the road for how models can behave. Just as Google, Facebook, and Apple would each build a very different search engine based on their unique principles, AI labs will build models that reflect their core values. This is already visible with models like Grok, which has a distinct personality, and this trend of differentiation is expected to continue.

AI's underhyped mini-apps and overhyped vibe coding

50:53 - 52:55

One of the most underhyped developments in AI is the integration of built-in products directly within chatbots. This concept extends beyond simple artifacts to create "mini apps" or interactive UIs within the chat interface itself. For example, instead of just drafting an email, a chatbot might create a clickable box that allows you to text the message to someone instantly. This evolution represents a significant shift in how users can interact with AI to accomplish tasks.

I think that concept of taking artifacts to the next level, where you just have these mini apps, mini UIs, within the chatbots themselves, I feel like people aren't talking enough about that.

On the other hand, "vibe coding" is considered overhyped. There is a growing concern that developers who simply dump AI-generated code into their codebases without proper review are creating future problems. This practice could lead to systems that are unmaintainable in the long term, even if they appear to function correctly in the short term. The progression toward integrated mini-apps, however, points to a powerful future where AI could build and evolve products directly from user requests, helping people realize their ideas much more quickly.

How a background in math and language led to the creation of Surge AI

52:55 - 57:05

Edwin Chen's unique background provided the perfect foundation for starting his company, Surge. As a child, he was fascinated by both math and language. This led him to MIT, not just for its math and computer science programs, but because it was the home of linguist Noam Chomsky. His dream was to find an underlying theory connecting these different fields.

Later, as a researcher at Google, Facebook, and Twitter, he repeatedly encountered the same problem: it was impossible to get the high-quality data needed to train AI models. When GPT-3 was released in 2020, he realized a completely new solution was necessary to build models that could code, use tools, and perform complex creative tasks. He felt the industry was too focused on simple tasks like image labeling, underutilizing human intelligence. A month after GPT-3's launch, he started Surge to build the use cases needed to push the frontier of AI.

Today, Edwin is driven by his passion as a "scientist at heart." He loves diving deep into new models, running evaluations, and writing detailed analyses. He is less interested in typical CEO duties like sales or constant meetings and prefers to be hands-on with the data and science. This personal drive shapes the company's culture.

We built Surge a lot more like a research lab than a typical startup. So we care about curiosity and long term incentives and intellectual rigor and we don't care as much about quarterly metrics and what's going to look good in a board deck.

His goal is for Surge to play a critical role in the future of AI, using its unique perspective on data and language to ensure AI develops in a way that is beneficial for humanity in the long term.

The ecosystem influencing AI's development

57:06 - 58:05

The development of AI is not solely driven by major labs like OpenAI and Anthropic. A powerful ecosystem of companies exists that influences the direction of AI by helping these labs identify gaps and areas for improvement. The future of AI models is still uncertain, which presents a significant opportunity. This uncertainty allows for a broader discussion on how to shape these technologies and what role humanity should play in their future development.

You are your objective function: Defining AI's true north star

58:06 - 1:01:01

The straightforward way to describe the work is training and evaluating AI. However, Edwin Chen explains there is a deeper mission: helping customers define their dream objective functions. This means figuring out what kind of model they truly want to build. Once this "North Star" is established, the goal is to help them train the model to reach it and measure its progress.

This process is complex because objective functions are rich and nuanced. It's not as simple as optimizing for a single metric. Edwin compares it to raising a child.

It's kind of like the difference between having a kid and asking them, what test do you want to pass? Do you want them to get a high score in SAT and write a really good college essay? That's a simplistic version versus what kind of person do you want them to grow up to be? ... How do you define happiness? How do you measure whether they're happy? ... It's a lot harder than simply measuring whether or not you're getting a high score in SAT.

This raises a broader question for the AI industry: are we building systems that actually advance humanity? Or are we optimizing for the wrong things, like clicks and likes, creating systems that just consume our time and make us lazier? It's easy to measure proxies, but much harder to define and measure whether something genuinely improves lives.

The core philosophy is, "You are your objective function." The focus should be on rich, complex objectives, not simplistic proxies. The goal is to build tools that make us more curious and creative, not just lazier. This is difficult because appealing to human laziness is an easy way to drive engagement. Choosing the right objective functions is critically important for our future.

You don't need to be someone you're not to build a company

1:01:02 - 1:02:28

Edwin Chen reflects on what he wishes he'd known before starting his company, Surge. He was hesitant to become a founder because he thought he would have to become a stereotypical "business person," spending all his time on financials and meetings. He always hated that kind of work and assumed it was an unavoidable part of running a company.

I thought if I started a company, I'd have to become a business person. Looking at financials all day and being in meetings all day and doing all this stuff that sounded incredibly boring and I always hated. So I think it's crazy that didn't end up being true at all.

To his surprise, he discovered he could build a successful company by remaining focused on what he loves: research and data. He still spends his days "in the weeds in the data" doing applied research, building data systems that push the frontier of AI. He realized you don't need to constantly tweet, generate hype, or spend all your time fundraising. You can succeed by simply building something so good that it cuts through the noise. Had he known this was possible, he would have started his company even sooner.

Training AI is like raising humanity's children

1:02:29 - 1:03:32

Edwin dislikes the term "data labeling" because it suggests simplistic work, like identifying cat photos or drawing boxes around cars. He believes the work is far more complex and philosophical, comparing it to raising a child.

I think a lot about what we're doing as a lot more like raising a child. You don't just feed a child information. You're teaching them values and creativity and what's beautiful and these infinite subtle things about what makes somebody a good person. And that's what we're doing for AI.

Instead of just feeding an AI information, the process involves teaching it values, creativity, and the subtle qualities that define a good person. In this view, the work is about shaping the future of humanity by determining how to raise its artificial children.

Edwin Chen shares his most recommended books

1:03:33 - 1:04:54

Edwin Chen shares three books he often recommends. The first is "Story of Your Life" by Ted Chiang, his all-time favorite short story about a linguist learning an alien language. The movie "Arrival" was based on this story. His second recommendation is "The Myth of Sisyphus" by Albert Camus, noting that he finds its final chapter particularly inspiring. The third book is "Le Ton beau de Marot" by Douglas Hofstadter. Edwin prefers it to Hofstadter's more famous book, "Gödel, Escher, Bach." It features a single French poem translated in 89 different ways, exploring the motivations behind each version.

I've always loved the way it embodies this idea that translation isn't this robotic thing that you do. Instead, there's a million different ways to think about what makes a high quality translation, which mimics a lot of ways I think about data and quality and LLMs.

Building a company that embodies your personal values

1:06:06 - 1:07:53

Edwin Chen's guiding principle is that founders should build a company that only they could build. He views this as a form of destiny, where their entire life, experiences, and interests shape them for that specific venture. This concept applies broadly, not just to founders, but to anyone creating something new.

To acquire the unique experiences necessary for such a creation, the advice is to genuinely follow your interests and do what you love. Edwin applies this personally, noting that a company is often an embodiment of its CEO. He used to think the CEO role was generic, just executing on what VPs or the board advised. Now, when facing big, hard decisions, he doesn't focus on metrics or what the company should do. He asks himself what he personally cares about, what his values are, and what he wants to see happen in the world.

Ask yourself, what are the values you care about? What are the things you're trying to shape and not what will look good on a dashboard?