Conversations with Tyler artwork

Conversations with Tyler

Brendan Foody on Teaching AI and the Future of Knowledge Work

Jan 7, 2026Separator22 min read
Official episode page

Brendan Foody is the founder of Mercor and the youngest unicorn founder on record.

He joins Tyler Cowen to discuss how hiring experts to train AI is shifting knowledge work away from repetitive analysis and toward building learning environments.

This conversation highlights why human taste and specialized rubrics are the most valuable assets for scaling intelligence in the modern economy.

Key takeaways

  • Expert humans can command high hourly rates in AI training because their specific insights are scaled to billions of users through the model.
  • Effective AI evaluation requires a mix of expert consensus and healthy disagreement to ensure the model handles edge cases rather than just following the norm.
  • The primary challenge for AI in professional fields is replicating taste and uncodified knowledge that experts have not documented in writing.
  • Evaluation data is more valuable than raw output because it provides a rubric that allows AI models to learn through iteration and scoring.
  • AI models could be calibrated using the specific tastes of historical peak periods, such as 1980s heavy metal, rather than relying on modern consensus.
  • Knowledge work is shifting toward a model where humans teach an agent a task once through reinforcement learning to automate it forever.
  • AI might be more trusted than human experts because it lacks a personality and the associated social friction.
  • People may respect expert knowledge more when it is delivered as an impersonal distillation through a machine rather than a person.
  • AI models will soon reach 75 percent automation in many fields, but the final 25 percent of expertise will remain a human-driven bottleneck for the foreseeable future.
  • Software demand is highly price elastic, meaning increased efficiency could lead to significantly more software engineers rather than fewer jobs.
  • The most effective way to assess talent is to measure actual skills through projects rather than relying on personal vibes or background similarities.
  • Measuring a candidate's slope or future growth potential is significantly harder than measuring their current skills or Y-intercept.
  • Instead of eliminating second chances, AI may help late bloomers by identifying why they failed in one role and finding a different environment where they can excel.
  • Instead of banning AI tools during hiring assessments, employers should evaluate how candidates use them to deliver real impact.
  • AI allows elite selection processes like the Thiel Fellowship to scale beyond local referral networks and identify unconventional talent globally.
  • Labor arbitrage can be achieved by paying employees in a product they value at retail price while the owner pays the wholesale cost.
  • Dyslexia often forces individuals to learn delegation early in life because they must rely on others for tasks like reading.
  • Career success is less about fixing every weakness and more about identifying and leveraging your unique comparative advantages.
  • Dating apps can improve societal happiness by providing more efficiency in the matching process, which helps solve gender imbalance issues in cities like San Francisco.
  • AI evaluation should move toward measuring how models handle complex tasks that take humans days or weeks to complete.

Podchemy Weekly

Save hours every week! Get hand-picked podcast insights delivered straight to your inbox.

Why poets are paid 150 dollars an hour to train AI

01:08 - 03:48

Brendan explains that poets can earn 150 dollars an hour by training AI models. This high pay is possible because the expertise of a single poet can be scaled to reach billions of users. When a top-tier poet teaches an AI model how to write or evaluate poetry, that knowledge is embedded into the software forever.

One of the reasons that we are able to pay so well to attract the best talent is that when we have these phenomenal poets that teach the models how to do things, they are then able to apply those skills and that knowledge across billions of users.

The work involves creating rubrics and evaluations similar to how an English professor might grade an essay. These experts define what a desirable response looks like for the model. However, finding the right talent in subjective fields like the liberal arts is difficult. Brendan looks for a specific balance where experts generally agree on quality but still have room to disagree on the finer points.

You want some degree of consensus of different exceptional people believing that they are each doing a good job. But you probably don't want too much consensus because you also want to get all of these edge case scenarios of what the models are doing that might deviate a little bit from what the norm is.

Tyler asks if AI can be used to monitor the quality of the human graders. Brendan confirms that models are often used to review human work. Since the platform manages tens of thousands of people, AI helps identify when a human grader is tired or not putting in enough effort. The models can provide a signal to ensure the human-generated rubrics remain accurate and high quality.

Measuring AI performance on economically valuable tasks

03:49 - 12:17

Standard AI benchmarks often fail to measure what people actually do at work. Instead of focusing on math competitions or academic reasoning, Brendan and his team developed the AI Productivity Index to track how models handle real tasks in medicine, law, and finance. By collaborating with experts like Larry Summers and Cass Sunstein, they are shifting the focus to outcomes that have direct economic value. These experts provide a broad vantage point on their industries, helping to structure rigorous data sets that capture how a model might automate a legal draft or a financial analysis.

The largest disconnect that we were seeing in AI research is that everyone was focused on academic evals which were wholly disconnected from the outcomes that customers actually care about. We chose legal experts, medical experts, and finance experts to see what is the right methodology to think about measuring success across each of these domains.

The improvement in these models is rapid, with a growth rate of about 25 to 30 percent per year on economically valuable tasks. To determine these numbers, the team surveys experts from top consulting firms about how they spend their work hours. This data serves as a proxy for economic value. While current models struggle with tasks that require 50 to 100 hours of work or the use of multiple tools, Brendan expects these capabilities to arrive within the next six to 12 months.

However, human expertise still holds an advantage in areas involving taste and uncodified knowledge. In law, for example, many approaches are not written down but exist in the heads of experts. Brendan suggests it might take two or three years before an expert like Cass Sunstein would struggle to find any error in a model's response. The difficulty lies in the fact that if specific expertise or nuances are not in the training data, the model will inevitably struggle with those problems.

The importance of evaluation rubrics in AI training

12:17 - 20:42

Deep domain experts can best contribute to AI advancement by defining evaluations. When researchers have high-quality tests for model capabilities in fields like law or economics, they can optimize the models much faster. Many experts outside Silicon Valley do not yet realize the importance of these tests. Others might fear how AI will impact their jobs. While raw data is useful, the most valuable data is a rubric or a way to measure success. This allows models to attempt a problem many times and learn from their failures.

The second kind of data is some way of measuring success where you have the rubric for the response, you have the test question, answer, you have the unit test and code. That second kind of data is the most valuable, where we are able to have the models attempt the problem many times, score those responses and learn from them.

There is a unique challenge when applying this to creative fields like poetry. Brendan suggests that a rubric for poetry would reward certain styles or ideas while penalizing others. However, Tyler points out that philosophers like Immanuel Kant argued that taste cannot be captured in a rubric. If a rubric is impossible, AI labs can use human feedback. Experts with great taste can choose between two model outputs until the AI understands their preferences. This highlights a tension between the taste of the top one percent of experts and what a general user might enjoy.

It is challenging because that sometimes deviates from the types of responses that the top one percent of experts in poetry might say as a broadly good poem. Striking that balance is really up to the researchers and product leaders at the labs.

Ultimately, the focus remains on economic value and practical utility. While modeling the style of historical poets like John Milton is possible, the priority is building tools that help modern users create work that gains traction and drives impact.

Training AI models on historical taste and personal data

20:42 - 26:25

Tyler expresses skepticism about contemporary poets. He finds many are too focused on postmodernism or identity. He prefers older poetry and questions why AI models should prioritize current consensus. Tyler suggests that we should instead enshrine the tastes of specific eras when they were at their peak. For example, movies might use evaluators from the 1960s, while heavy metal might rely on taste from the 1980s. Brendan believes that AI will eventually enshrine taste from every decade. This would allow a model to pull from various knowledge bases to personalize output to a user's preferences.

The notion that you enshrine current taste, when taste changes so much, is a very interesting decision. My guess is that in a long enough time horizon we will enshrine taste from every different decade and every different era. The model will be able to learn what taste you have and pull on each of those knowledge bases to best personalize it to your preferences.

Brendan argues that much of the economy will soon become a reinforcement learning environment. In this future, professionals may only need to perform a task once. An investment banker could teach a model their specific analysis process. This effectively builds a piece of software that can be used repeatedly. This shift moves knowledge work toward a fixed cost investment where agents perform the monotonous activities previously handled by humans.

We will move towards a world where people do things once. Instead of the investment banker redundantly analyzing a data room to prepare an analysis of a company every couple of weeks, they will teach the model how to do that once. Similar to building software once, they will be able to use that many times.

The potential for recording daily life via wearable pendants could accelerate this process. Tyler asks about the social and economic value of recording every conversation. Brendan suggests that for some individuals, this data could be worth hundreds of thousands of dollars. Privacy remains a major concern for most users. Tyler proposes using AI to filter out sensitive details before sharing data. Brendan notes that while this is possible, trust remains a significant barrier. Companies like Apple may have an advantage here because they have built a strong brand around privacy.

The future status of human experts

26:25 - 27:37

Tyler explores the reputation of human expertise in a world where AI models surpass human experts. One possibility is that machines will be respected more because they lack a personality. Human experts can often be perceived as annoying or biased when they appear on television. AI provides an impersonal distillation of knowledge. This detachment might actually elevate the perceived value of the experts who originally contributed to that knowledge.

The machine, by not being tied to a personality, is less disliked and people actually respect the experts more because they get this impersonal distillation of the experts. They are not annoying me like on the late night TV show.

Brendan notes that he already trusts AI models more than human experts in specific areas, such as quick medical advice. The fact that the technology is highly competent but lacks a face contributes to this trust. We often place more faith in a tool that does not have the personal baggage or visibility of a human representative.

The shift toward human expertise in model training

27:37 - 31:17

AI models are advancing rapidly and can already automate a significant portion of what experts do. However, they will likely struggle with the final 25 percent of tasks that require deep expertise. Human knowledge will remain the ultimate bottleneck to economic productivity for a long time. While models might be able to write poetry as well as the median Pablo Neruda poem within a year, the very best work remains much further out. The most difficult advancements lie in this long tail of capability.

I think that for a very long time human expertise will be imperative to help accomplish that last 25% as the ultimate bottleneck to more economic prosperity and productivity.

Models are currently superhuman within the constraints of a chat window, yet they still struggle with basic autonomous tasks like scheduling meetings or drafting emails. Brendan notes that while many in Silicon Valley focus on automating jobs away, the real shift will be toward a new job category: people training agents and building reinforcement learning environments. Instead of performing traditional analysis, investment bankers, consultants, and engineers will spend their time teaching models how to handle specific workflows. This shift could see a majority of high-end knowledge workers training models within five years.

Instead of the investment bankers doing the analysis, they'll build RL environments and train agents and it'll be the same across consulting and software engineers and customer support and pretty much every knowledge work vertical.

Workers will not necessarily need deep technical AI knowledge to hold these new roles. The primary skill required is domain expertise. A person simply needs to identify where a model makes a mistake and understand the frontier of its capabilities. By recognizing these errors and creating criteria to measure them, experts can provide the feedback necessary for the model to learn. This human oversight will become the primary way to improve models across long-term tasks that span hundreds of hours or days.

Price elasticity in software and capital allocation

31:18 - 33:15

The demand for software is highly price elastic. This elasticity determines how job displacement will evolve in the future. If software engineers become ten times more efficient, the industry might not shrink. Instead, there could be ten times more engineers building a hundred times more software. This is different from fields like accounting or customer support, where demand might be limited.

I think if we make software engineers 10 times more efficient, we'll have even more software engineers. Maybe we'll have 10 times as many software engineers and build 100 times as much software.

Price elasticity also applies to building and distributing businesses. Capital allocation currently suffers from enormous inefficiency, especially in early stage investing. Brendan notes that early on at Mercur, it was difficult to secure even ten thousand dollars of working capital, yet markets become very capitalized once a company reaches scale. Better analysis will change how information manifests within companies. It will help operators treat internal resource management like an investing problem, allowing them to better understand where to place their bets.

The future of AI in education

33:15 - 34:41

In the next five to ten years, education will likely transform through the availability of AI tutors. Brendan suggests a future where everyone has a personal tutor available at any time to teach any topic. This access makes it much easier for students to motivate themselves because the information is explained better and is more accessible than ever before.

If everyone has Sal Khan as their personal tutor available 24/7 to teach them whatever topic they want to learn, it will be that it is much easier to motivate themselves. It is much better access to information, much better ways of explaining that information and that will be profoundly impactful.

While AI can handle many aspects of instruction, Brendan believes the role of the teacher remains essential. Teachers provide personal relationships and guide students through both their academic journey and emotional development. Instead of disappearing, the teaching profession may shift toward smaller class settings and more frequent, high-quality contact between teachers and students.

How Mercour used its own technology to scale its team

34:41 - 35:23

Mercour currently employs over 300 people around the world. Brendan attributes this rapid growth to the company's own technology platform. The business began by automating how they reviewed resumes and conducted interviews.

The origin story of the company was automating all of the ways that we would review resumes, conduct interviews and decide who to hire. The ways that we assess talent, the ways that we optimize funnels to build out teams, is really ingrained in the DNA of the company.

The founders prioritize talent assessment and optimizing the hiring process. This focus is part of the company's core identity. Using their own tools helps them build strong teams more efficiently.

Rethinking talent assessment and the future of interviews

35:24 - 39:17

Many people interview job candidates incorrectly. They fail to measure the specific skills needed for the role. Instead of testing an applicant's ability to analyze data or complete a task, they focus on personality and background. This creates a vibe space conversation where interviewers look for people similar to themselves. The best way to evaluate talent is to give them a project and grade the result.

Hiring for non-technical roles like legal or communications is more difficult. In these situations, the best proxy for performance is looking at similar past work. Interviewers should investigate the details of a candidate's previous environment and speak with their former colleagues. It is much easier to measure where a person is today than to predict their future growth over time.

Body language can be a false signal. I have had cases where I over index on a person feeling a little bit awkward, but they do a phenomenal job at the actual work. It is important to be cautious around which signals actually correlate with performance.

Brendan believes that AI will eventually outperform humans in talent assessment. Humans are generally not very good at interviewing. While noise like family issues or health can make interviews inconsistent, AI can eventually use manager notes and long-term data to provide better predictions. This could lead to a phenomenal system within ten years.

AI and the future of labor market efficiency

39:18 - 42:53

Labor markets are currently inefficient because information is scattered. Candidates only apply to a small number of roles and employers only see a tiny fraction of the available talent. A central aggregator powered by AI could solve this by creating a perfect flow of information. While platforms like LinkedIn have wide reach, they struggle to predict how well a person will actually perform in a specific job. This makes hiring a matching problem rather than just a distribution problem.

As the nature of work shifts toward remote and fractional roles, the potential for global matching increases. However, Tyler suggests that if AI helps everyone create perfect resumes and practice interviews, employers might revert to nepotism and personal recommendations to filter candidates. Brendan hopes that instead, companies will use models trained on actual performance data to make objective hiring decisions. These models could even identify when a personal reference is a poor predictor of success.

My hope is that we have models that are helping to run companies in a very thoughtful, efficient way that are data driven about it, where the models have a eval set of all of the performance reviews of people in that given company and they're able to make an accurate prediction over whether this reference or that piece of nepotism should actually be considered or maybe as a counter signal.

This automated tracking might seem like it would harm late bloomers or people who need second chances. Brendan believes the opposite is true. Because AI can identify why someone failed in a previous role, it can find a different environment where they might excel. Many jobs in the economy are roles where almost anyone could succeed if they are matched with the right intersection of interest and economic value.

How AI tools are transforming talent assessment and selection

42:53 - 47:27

In talent assessment, people initially tried to fight AI. They would have candidates write essays on paper to prevent the use of tools like ChatGPT. Brendan believes the right approach is to see what people can achieve when they have access to all available technology. If a candidate uses powerful code generation tools to build a product in an hour, that provides a more accurate picture of their real world potential.

That's a far better predictor of this person's ability to actually deliver impact than it is to say, don't use the tools at all.

The Thiel Fellowship faces a matching problem because it can only consider a small percentage of potential candidates. Brendan has worked with the fellowship to build AI interviews that analyze transcripts for signals of success. While traditional referrals are valuable, the goal is to find unconventional thinkers from every part of the world who might not otherwise get a meeting with a venture capitalist. Scaling this kind of elite vetting could drastically improve economic mobility.

Imagine if we could have Peter interview everyone in the world when they're 18 or 20 or whatever the age is and make a decision around whether he wants to give them 100k check. That would probably be very powerful with respect to economic mobility and how many companies we're able to create.

Technology might eventually allow for a system where a panel of digital domain experts interviews everyone at a young age to determine who should receive funding. While a single human interviewer like Peter might only be effective for a specific subset of people, an aggregate of digital experts could provide high quality assessments for every industry at a massive scale.

Brendan on the value of global travel

47:27 - 48:41

Brendan has worked 100 hours a week for the past three years. If he were granted a year of freedom from his company and AI, he would choose to travel. He wants to understand how perspectives vary across different countries and geographies. This global perspective is valuable for understanding how people view major shifts in the world.

I do think that seeing the world and getting more of this understanding of how do perspectives vary by country and geography, how are people thinking about AI differently elsewhere is really interesting. I think that global perspective is incredibly valuable and informative.

Brendan recalls how Sam Altman traveled the world after ChatGPT was released to see its impact on different regions. He believes this broad view is informative and provides a deeper understanding of how the world is changing.

The value of travel in building professional relationships

48:41 - 49:43

Brendan expresses a strong interest in visiting Japan, but the conversation moves toward the deeper professional advantages of international travel. Tyler suggests that travel provides a unique kind of context for understanding people, comparing it to how reading classic literature gives one a taste for different eras of history. For those working in global tech hubs like the Bay Area, visiting countries like India is particularly important because it allows for a better understanding of where colleagues and employees are coming from.

If your model has the poetic taste of different eras, of John Milton, Wordsworth, Shakespeare, whatever, traveling is an individual's version to get some version of that.

Brendan agrees that these experiences are vital for establishing rapport quickly. Having first-hand knowledge of a person's home country creates an immediate point of connection. This familiarity helps in building trust and strengthening relationships across a diverse workforce.

Being able to connect with those individuals very quickly around, hey, I've been to this place and I'm very familiar with India and all these different things is really helpful in building relationships and setting up trust across all the different people that we work with and interact with.

Brendan Foody on early entrepreneurship and the art of speaking

49:43 - 53:09

Brendan started a donut business in 8th grade. He bought donuts from Safeway for five dollars a dozen and sold them for two dollars each at school. When the principal shut him down, he moved the stand just outside school grounds where he could not be policed. He even paid his mother twenty dollars a week to drive him in her minivan. He based this price on the cost of an Uber ride.

I'd pay my friends in donuts because I perceive the cost of the donuts as my cost basis versus they perceived it as two dollars each. And so I had a little bit of arbitrage in the salaries.

Brendan also managed competition through aggressive pricing. When a rival started selling high-end donuts, he dropped his prices for two weeks to drive them out of business. This was his first experience with anti-competitive dynamics. He later joined his high school policy debate team, where he met his co-founders. They were an incredibly successful team and won several national tournaments together.

Success in speaking often comes down to more than just intelligence. High clarity of thought and confidence are essential for improvement. There is also a distinction between the speed of thought and the depth of thought. Some people have high aptitude but think slowly and deeply. Others are quick on their feet and can react instantly. Brendan views himself as a deep and slow thinker.

53:10 - 55:16

There is a strong correlation between dyslexia and entrepreneurship. Brendan finds that his brain works differently than many of his peers. While some people can read through evidence very quickly, he has learned to approach problems with more creativity and unconventionality. This cognitive difference provides a unique advantage when starting a company.

There are certain ideas or ways of approaching a problem that are just different, that enable more creativity, potentially being unconventional in doing so. And I think that that is one advantage I've had.

Tyler suggests that the mechanism behind this success is early delegation. Dyslexic individuals often must ask others for help with reading tasks from a young age. This forces them to develop a skill that many competent people do not learn until much later in their careers. Brendan agrees that this necessity encourages a focus on the big picture, which is essential for a founder.

Struggling in specific school environments can be a humbling experience. It teaches the importance of understanding personal strengths rather than obsessing over weaknesses. Brendan applies this philosophy at his company. He encourages employees to focus on their comparative advantages. Career success comes from leveraging phenomenal strengths rather than trying to be perfect at everything.

Brendan Foody on San Francisco culture and Jesuit influences

55:17 - 59:04

Brendan spends most of his time within his company. Since many of the employees are around 22 years old, he stays connected to that specific demographic. He acknowledges that he has spent less time with people his own age than he would have if he stayed in college. Regarding the idea of a dating crisis for young men, Brendan believes the issue is particularly acute in San Francisco due to gender imbalances in certain industries. However, he remains an optimist about technology's role in romance.

I am very much a proponent of better technology to solve these matching problems and enable people to be happy in their lives.

Brendan carries the last name Foody and shares a love for good food that he inherited from his father. For Mexican food in San Francisco, he recommends El Matate. For higher end dining, he suggests spots like Catonia and Quince. He finds that the app Belly is the most accurate guide for finding good food in the city because it has a high density of local users.

The company name Mercor comes from the Latin word for marketplace. This name reflects the goal of building the largest marketplace in the world. Brendan and his co-founders attended a Jesuit high school where they developed an interest in Latin roots. Although Brendan is not Catholic, he credits the school with helping him focus on productive goals like speech, debate, and building companies.

My mom was concerned about whether I would start selling drugs when I was doing my donut stand in eighth grade because it is an easy step. I like to think that Catholic school helped instill good values in what I should care about.

Measuring AI utility and the future of labor

59:04 - 1:00:46

The next major goal for the company involves scaling realistic evaluations of AI models. Brendan wants to measure how these models use various tools to complete tasks that would typically take a human days or even weeks to finish. This shift is crucial for businesses that need models to be practically useful rather than just theoretically intelligent.

The next goal for the company is really in scaling up a lot of these super realistic evaluations. How do we measure the ways that models use all sorts of different tools on trajectories that would take someone days or weeks to do is a big focus for us.

Many people focus on the concept of intelligence, but Brendan is more interested in bridging the gap between what enterprises need and what models can actually do. He also wants to explore the intersection of labor markets and AI research. Specifically, he is looking at how to apply human talent to frontier AI problems in more efficient ways to improve model training.

People have been very focused on the idea of intelligence rather than the idea of models being useful and bridging the gap between what do enterprises actually want to use.
Podchemy Logo