Podcast Notes /// Dylan Patel — Deep dive on the 3 big bottlenecks to scaling AI compute

Dylan Patel, founder of SemiAnalysis, explains the three major bottlenecks of logic, memory, and power that limit AI scaling.

He breaks down the complex economics of hardware manufacturing and the massive infrastructure needed to support the next generation of models.

These physical constraints determine which tech giants will lead the industry and how fast artificial intelligence can actually progress.

Key takeaways

AI labs may need to more than double their compute capacity within a single year just to support the inference demands of their projected revenue growth.
The value of hardware is tied to the utility of the models it runs. An H100 is worth more today than years ago because it can now run more efficient, high quality models.
The Alchian-Allen effect suggests that as the fixed cost of compute rises, users will gravitate toward the highest quality AI models because the relative price gap narrows.
The ultimate constraint on AI scaling by the end of the decade will be the production capacity of EUV lithography machines by ASML.
While data centers can be built in under a year, semiconductor fabs require two to three years of construction, creating a significant lead time disparity.
The AI supply chain faces a major bottleneck where $1.2 billion in tooling can hold up $100 billion in total market value.
Nvidia demonstrates extreme financial leverage by turning a small fraction of TSMC's hardware investment into $160 billion in annual revenue.
Unlike other hardware monopolies, ASML links its price increases directly to improvements in machine throughput and accuracy rather than pure market demand.
A production gap exists because hardware suppliers are often more cautious about future demand than the AI companies who are actually using the chips.
The primary challenge in scaling AI is the efficiency loss that occurs when moving data between hundreds of interconnected chips.
AI demand is causing a memory crunch that could triple the component costs for smartphones, leading to significantly higher retail prices.
AI data centers can access more power by using batteries to manage peak loads, which unlocks the 20% of the grid that usually stays idle.
Power costs are a minor factor in AI infrastructure because the value generated by improved models far exceeds the expense of even doubling electricity prices.
Modularization allows data centers to scale despite labor shortages by shifting complex wiring and plumbing from construction sites to specialized factories.
GPUs have high failure rates, often requiring physical repairs that make remote or space-based deployments economically and logistically risky.
Compute efficiency gains from research can make model training ten times cheaper annually. Because of this, labs prioritize research over massive pre-training to achieve the fastest possible technological takeoff.
Apple is losing its status as TSMC's most favored customer as AI companies begin prepaying for chip capacity and manufacturing costs.
Centralizing robot intelligence in data centers allows companies to bypass the intense competition for high-end chips at major factories like TSMC.
The semiconductor industry has a circular dependency where the tools needed to make chips require the very chips they produce.
The bottleneck for AI development has shifted from chip design to securing the entire infrastructure stack, including power and land.

The timeline of Big Tech and AI lab investments

00:14 - 04:01

Big Tech companies like Amazon, Meta, Google, and Microsoft have a combined forecasted capital expenditure of 600 billion dollars. While this represents an enormous amount of potential computing power, the money is not all being spent on hardware today. Dylan explains that much of this capital is allocated to infrastructure that will not come online for several years. This includes deposits for power turbines and construction costs for data centers planned for 2027 and beyond.

A big chunk of that is spent on turbine deposits for 28 and 29. A chunk of that is spent on data center construction for 27. A chunk of that is spent on power purchasing agreements and down payments so they can set up this super fast scaling.

AI labs like OpenAI and Anthropic are the primary drivers of this demand. Their need for computing power scales rapidly with their revenue. For instance, if Anthropic maintains its current growth, it might need four gigawatts of capacity just for inference. This estimate assumes their research and development needs stay the same. To meet these goals, Dylan suggests that Anthropic needs to reach more than five gigawatts of capacity by the end of the year.

The compute strategies of Anthropic and OpenAI

04:01 - 06:17

Anthropic faces a challenge with computing power. Dylan explains that the company was conservative with their scaling strategy. Dario wanted to keep the company responsible and avoid bankruptcy. This principled approach meant they undershot their compute needs. Now they may need to turn to newer providers to find the capacity they require.

Anthropic was a lot more conservative. They would sign contracts but stay principled. They purposely undershot what they thought they could possibly do because they did not want to potentially go bankrupt.

OpenAI took a much more aggressive path. They signed massive deals with many different players. Beyond their partnership with Microsoft, they secured capacity from Google, Amazon, Oracle, and CoreWeave. They even partnered with SoftBank Energy. This company has no prior experience building data centers. This strategy gave OpenAI far more access to compute by the end of the year.

There was a period of financial concern in the market regarding these deals. Investors worried that OpenAI signed contracts they could not afford. This caused stock prices for some providers to drop. However, OpenAI raised enough money to pay for their commitments. While Anthropic avoided this risk, they now have less computing capacity than their rival.

The economics of acquiring emergency AI compute

06:17 - 10:17

Acquiring compute in a rush often means paying high margins to providers. While companies like OpenAI usually sign five-year deals, other customers might have shorter contracts. When those shorter deals expire, the most aggressive participants in the market compete for the capacity. Dylan explains that some AI labs have paid as much as $2.40 per hour for H100s on two or three-year deals. This is significantly higher than the standard cost to build and operate the hardware over a longer period.

Those margins are way higher. And so now you can crowd out all of these other suppliers, whether it is Amazon or CoreWeave or Together AI.

Providers often have most of their new capacity sold out long before it goes online. However, some spare capacity exists when hyperscalers or neo-clouds have not yet sold their upcoming inventory or when they repurpose hardware intended for internal use. For a company like Anthropic, the choice is often between paying a massive markup for spot compute or entering revenue-sharing agreements with partners like Amazon or Google.

Having the best model at any given moment is a temporary advantage, but it provides the financial leverage to secure future infrastructure. When a model's revenue grows quickly, the company can convince providers to sign massive deals. This allows them to lock in compute in advance at better prices than their competitors who might still be struggling to raise funds or prove their capabilities.

The shifting economics of GPU depreciation

10:19 - 17:01

Financial experts often debate the depreciation cycle of GPUs. Some skeptics argue that because technology advances so quickly, a GPU might only have a two or three year lifespan before it becomes obsolete. This perspective suggests that as new chips enter the market with much higher performance, the rental price for older models like the H100 should fall quickly.

Dylan Patel offers a different lens based on utility rather than just comparative performance. While newer chips are more efficient, the extreme limit on semiconductor supply means the value of a chip is determined by the value it can create today.

The price of a GPU would continue to fall. That is like one lens. The other lens is what is the utility you get out of the chip. Because you are so limited on semiconductors and deployment timelines, you end up with actually what prices these chips is not what is the comparative thing I can buy today. It is actually what is the value I can derive out of this chip today.

This utility increases as models become more efficient. For instance, a new model might be much cheaper and better to run than a previous version. Because an H100 can serve more tokens of a higher quality model, the hardware actually becomes more valuable over time. Dylan notes that an H100 is worth more today than it was three years ago. This is due to software advancements. If a chip can eventually host human level intelligence, the financial math shifts. A human worker can produce six figures of value per year. If a single H100 could produce that output, the chip would pay for itself in months.

The economics of compute and the Alchian-Allen effect

17:02 - 24:50

Dario from Anthropic has suggested we are only a few years away from data centers full of digital geniuses. These models could earn trillions of dollars. If this is true, it seems inconsistent to be conservative about buying compute. Dylan notes that as models become more powerful, the value of each GPU increases. This creates a strong incentive to commit to hardware early.

Anthropic is sometimes seen as having commitment issues with hardware. There is an economic principle called the Alchian-Allen effect. When the fixed cost of two similar goods increases by the same amount, people tend to shift toward the higher quality option. For example, if a high quality apple and a low quality apple both see a price increase, the relative price difference shrinks.

The whole effect is that if there is a fixed cost that is applied to both, the relative price, the price difference between them, the ratio changes. Previously, the more expensive one was 2x more expensive. Now it is just 1.5x more expensive.

In the AI world, if GPUs become more expensive, customers will likely pay higher margins for the very best models. They are already paying a massive fixed cost for compute, so they might as well ensure they are using the superior model. Companies that signed long term contracts for compute years ago now have a significant margin advantage. They locked in prices before the current demand surge. Most of the market is now in these long term deals. The entities holding the most power are those that control the supply chain, such as Nvidia and memory vendors. These providers have secured capacity and long term contracts. While model vendors are currently capacity constrained, they may have to raise prices to manage demand.

Nvidia's strategic dominance in the 3nm supply chain

24:51 - 30:55

Nvidia is on track to control a massive portion of the 3nm wafer capacity by 2027. This dominance is unique because other sectors of the AI industry typically try to fracture their supply chains to avoid leverage. Cloud providers distribute compute to different startups and AI labs like OpenAI seed many data providers to avoid being locked into one source. Yet, in the semiconductor world, TSMC is allowing Nvidia to take a significant lead.

TSMC has a complex calculation for allocating its 3nm process. Apple previously dominated this space, but they are moving toward 2nm as rising memory prices squeeze their margins. Dylan notes that TSMC actually prefers the CPU business, like Amazon Graviton, because it offers stable long term growth. They view the AI chip market as more cyclical and are often more hesitant to allocate capacity to accelerators like Amazon Trainium or Google TPU.

Nvidia's success comes down to being more aggressive and acting earlier than its competitors. While Google and Amazon faced delays or internal hesitation, Nvidia signed non-cancelable contracts and paid deposits to lock down the supply chain. They verified capacity with every vendor, from PCB suppliers in China to memory manufacturers, to ensure they had everything needed for their next generation of chips.

Nvidia was way more AGI pilled than Google was at Q3 of last year, or Amazon was at Q3 of last year. He saw way more demand. You can see all the data center construction. Google especially, even though their TPU is just better for them to deploy, they have to deploy a crapload of GPUs because they don't have enough TPUs to fill up their data centers. They can't get them fabbed.

Dylan points out that this aggressiveness stems from being more AGI pilled than the big cloud providers were a year ago. Even if Jensen does not fully believe software will be entirely automated, he moved faster than Google or Amazon to secure the infrastructure. This lead has created a situation where Google has to deploy Nvidia GPUs simply because they cannot get enough of their own TPUs manufactured to fill their data centers.

Google's shift to aggressive AI infrastructure

30:55 - 34:33

Google made a surprising decision to sell a massive amount of TPU capacity to Anthropic instead of keeping it for its own internal lab, DeepMind. This happened because of a misalignment between Google executives and the actual needs of the AI labs. Anthropic saw a dislocation in the market and negotiated a deal to get access to this computing power before Google realized its value. Over a six week period in early Q3, the capacity for these chips went up significantly as Google even had to explain the sudden increase to its manufacturer, TSMC.

The main people on the compute team at Anthropic saw this dislocation and negotiated a deal. They were able to get access to this computer before Google realized. It is pretty clear to me that Google screwed up.

Once Google saw its own user metrics skyrocket with Gemini, leadership tried to increase their chip orders. However, they found that capacity was already sold out for the following year. This delay forced a major shift in Google's strategy. They have since become highly focused on artificial general intelligence and are moving aggressively to secure infrastructure. This includes buying energy companies, putting deposits down for turbines, and negotiating long-term utility agreements to power new data centers.

Since then Google has gotten absurdly AGI pilled in terms of what they are doing. They bought an energy company, they are putting deposits down for turbines, and they are buying a ridiculous percentage of the powered land.

The shifting bottlenecks of AI semiconductor manufacturing

34:34 - 40:56

The primary bottleneck for scaling AI compute has shifted over time. It started with packaging technologies like CoWoS and moved to power and data centers. Looking five years out, the constraint returns to the semiconductor supply chain itself. Building a fab takes two to three years, while companies can build a data center in as little as eight months. This difference in lead times creates a massive lag in capacity as the industry tries to keep up with demand.

The bottlenecks as we have scaled have shifted from what the supply chain is currently not able to do, which was CoWoS and power and data centers. But those were all shorter lead time items. Power and data centers are ultimately way more simple than the actual manufacturing of the chips.

In the past, the industry could shift capacity from mobile phones and PCs to AI chips. Now that Nvidia is the largest customer for major manufacturers like TSMC and SK Hynix, there is no more room to divert existing resources. By the end of the decade, the lowest rung of the supply chain will be the main bottleneck. This is ASML, the company that produces EUV machines. These tools are the most complicated machines in the world and cost hundreds of millions of dollars. ASML currently makes about 70 units a year and might reach 100 by 2030.

It would be very interesting if there is an absolute gigawatt ceiling that you can project out to 2030 based just on the fact that we cannot produce more than this many EUV machines.

Dylan explains that manufacturing a gigawatt of compute requires about 2 million EUV passes across various wafer types. This translates to needing about three and a half EUV tools per gigawatt. While a gigawatt of data center capacity costs around 50 billion dollars, the lithography tools required for it cost significantly less at roughly 1.2 billion dollars. Despite the lower relative cost, the limited production of these machines defines the upper limit of what can be built.

The massive leverage in the AI supply chain

40:56 - 41:57

A massive amount of AI value is currently stuck in the supply chain. While the total value reaches $100 billion, it is held up by just $1.2 billion worth of tooling that cannot expand quickly enough. This creates a significant bottleneck for the entire industry.

Dylan explains the scale of this leverage using TSMC and Nvidia as examples. Over the last three years, TSMC spent $100 billion on capital expenditures. Nvidia uses only a small fraction of that investment for its 3 nanometer and 4 nanometer chips. Despite using such a small portion of the infrastructure, Nvidia generated $40 billion in earnings just last quarter. That pace suggests an annual revenue of $160 billion.

Nvidia alone is turning some small fraction of 100 billion in CapEx that is going to be depreciated over many years into $160 billion in a single year.

This intensity increases as you look further down the supply chain. ASML provides a billion dollars worth of machines to produce significant output. Because these machines last for many years, the value they create far exceeds their initial cost. The ability to turn relatively small investments in hardware into massive annual returns defines the current state of the AI economy.

ASML and the scale of AI infrastructure

41:57 - 48:14

The numbers behind Sam Altman's vision for massive AI infrastructure appear compatible with current manufacturing trends. The global ecosystem currently holds about 250 to 300 EUV tools. With production scaling to 100 tools per year by 2030, the total count will reach approximately 700. This capacity could support 200 gigawatts of AI chips for data centers. Altman's goal of 50 gigawatts a year would only require about a 25% share of that total fabrication capacity.

Sam wants 50 gigawatts a year. He is only taking 25% share. That is very reasonable given this year alone he will have access to 25% of the Blackwell GPUs that are deployed. It is not that crazy.

While the most advanced machines in this industry can last a decade, they are not stagnant. ASML has rapidly improved specifications like overlay accuracy and wafer throughput. Overlay is the ability to stack layers of circuits precisely on top of each other after many processing steps. Although tool prices have risen from 150 million to 400 million dollars, Dylan Patel points out that their capabilities have more than doubled. ASML is also notably generous compared to other tech giants. They have a total monopoly on EUV technology yet they have not increased prices or margins as aggressively as Nvidia.

ASML has never risen the price more than they have increased the capability of the tool. In a sense, they have always provided net benefit to their customer. It is not that the tool is stagnant.

Expanding production is difficult because the supply chain is not yet fully convinced of the massive demand for AI. Many suppliers find high forecasts hard to believe. Additionally, the complexity of the machines creates long time lags. An EUV tool has four major components manufactured in different locations, including San Diego and Connecticut. One critical part is the source, which uses a laser to hit falling tin droplets with incredible precision.

The complexity of EUV lithography and its supply chain

48:14 - 53:55

The process of creating advanced chips starts by blasting tin droplets with high power to release extreme ultraviolet light. This light is directed through a complex lens stack. These components are actually multi-layer mirrors made of molybdenum and ruthenium. Any defect in these thin layers will ruin the entire system. Because the precision is so high, the production process is artisanal. Only a few hundred tools are made each year.

The reticle is moving one direction and the wafer is moving the other direction. As it scans a 26 by 33 millimeter section of the wafer, it stops and shifts to another part. It does that in just seconds. Each of them are moving at nine Gs in opposite directions.

The machines are massive and require extreme metrology to ensure everything is perfect. If one part is slightly off, the yield drops to zero. Dylan notes that these tools are built in the Netherlands and then deconstructed to be shipped on many planes. Reassembling and testing them at a customer site takes months. This complexity makes it impossible to increase production quickly.

The supply chain involves over 10,000 individual suppliers. Many of these companies do not share the same urgency as AI firms. While companies like OpenAI and Anthropic are desperate for more hardware, the suppliers further down the chain are more cautious. They often build less than what is actually needed because they are not fully convinced of the artificial general intelligence timeline.

OpenAI and Anthropic know they need a certain amount. Nvidia is not quite as AGI pilled and they are building a bit less. As you go down the supply chain, everyone is doing even less because they are not AGI pilled. You end up with a time lag for this to react.

Supply chain limits on AI energy goals

53:55 - 54:38

The energy requirements for AI are growing at a pace that the current supply chain may not be able to support. Hardware is becoming more powerful and power-hungry, with components moving from 500 watts to 1,000 watts. Major tech leaders have set incredibly ambitious goals for the end of the decade. Elon Musk aims for 100 gigawatts a year for space projects, while Sam Altman targets over 50 gigawatts for his ventures. When these goals are combined with the needs of other giants like Google and Anthropic, a significant problem emerges.

The supply chain can't possibly build enough capacity for everyone to get what they want. You look at what Elon wants, what Sam Altman wants, and Google needs the same. You go across the supply chain and the numbers just do not work.

There is a disconnect between the desired increase in production and the technical reality of the supply chain. Even with improvements in tools and faster production cycles, the sheer scale of the required energy infrastructure is unprecedented. The industry is facing a future where demand for power vastly exceeds the capacity to build the necessary systems.

The limitations of scaling AI with older chip processes

54:39 - 1:00:47

Scaling AI compute by using older semiconductor processes, such as 7 nanometer chips, may seem like a viable solution to supply chain bottlenecks. However, this approach is often naive because it ignores how modern chips are designed. Simply comparing the raw floating point operations of different chip generations is not a fair assessment. Each generation is optimized for different numerical formats. While an older chip might be designed for FP16, a modern one like Blackwell is optimized for FP4 or FP6. These design targets drastically change the performance reality.

Every time you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to transmit over high speed electrical serdes. There is a latency cost, there is a power cost, and all these dynamics that hurt.

Dylan explains that the true gating factor for performance is not just the individual chip but how hundreds of chips work together. Large models do not run on a single processor. They are split across many GPUs, and moving data between those chips creates massive efficiency losses. Using older, less efficient processes increases the latency and power costs of these connections. This makes it much harder to serve production traffic effectively compared to using the latest technology.

The challenges of chip communication and scaling

1:00:47 - 1:07:58

As process nodes shrink, the amount of compute on a single chip increases significantly. However, a major challenge is the speed at which data moves. On a single chip, data can travel at hundreds of terabytes per second. Once that data has to move between chips or across different racks in a data center, the speed drops drastically. This creates a ladder of performance where communication gets slower the further the data has to travel.

Dylan explains that this is why comparing chips like Nvidia's Hopper and Blackwell is not just about raw computing power or flops. Even if they use the same process node, the architectural improvements in how chips communicate can lead to massive performance gains. For certain tasks, Blackwell can be twenty times faster than Hopper because of how it handles data movement.

The performance difference is not just going to be the difference in flops. It is cumulative between the difference in flops per chip, networking speed between chips, and system memory bandwidth. All of these things compound.

There is a growing trend to put more dies on a single package to keep communication speeds high. This is what Nvidia is doing with Blackwell and what Google and Amazon are doing with their own chips. Tesla previously tried a radical version of this with their Dojo chip. It was the size of an entire silicon wafer and contained 25 chips. While Dojo was excellent for certain tasks like convolutional neural networks, it struggled with the transformer models used in modern AI because of its specific memory and arithmetic trade-offs.

Looking toward 2030, the conversation shifts to China's ability to create a completely independent semiconductor supply chain. While China currently relies on equipment from the West and Taiwan, they are investing heavily in their own lithography tools. Dylan expects China to have working deep ultraviolet and extreme ultraviolet tools by 2030. However, having a working tool is different from being able to manufacture chips at a massive scale.

There is having it work and then there is production hell. ASML had extreme ultraviolet tools working in the early 2010s, but the tools were not accurate or reliable enough for high volume manufacturing yet. It takes a lot of time to ramp up production.

China's path to semiconductor indigenization and the AI compute race

1:07:58 - 1:12:43

The journey from a lab prototype to mass production in a semiconductor fab is a slow process. It took roughly seven years for EUV technology to make that transition. China is currently attempting to build a fully domestic supply chain. This is difficult because they must replace specialized components, such as projection optics, that they currently buy from Japan. Dylan estimates that China might eventually produce around 100 DUV tools annually. While this is impressive, it still trails the output of companies like ASML.

Production hell takes time. It took another five to seven years to get EUV into mass production at a fab rather than just it working in the lab.

The long-term competition between the West and China depends heavily on AI development timelines. If the path to advanced AI takes until 2035, China has a significant window to catch up in semiconductor manufacturing. However, the performance of AI models may soon start to diverge. Western companies are moving away from simply selling tokens. They are building models for automated white-collar work. These models use reasoning chains that are not visible to the user. This makes it much harder for other countries to distill the knowledge and catch up. Furthermore, the physical scale of compute is growing at an incredible rate. Major labs in the West are scaling to 10 gigawatts of capacity, a pace that China has not yet matched.

The link between AI investment timelines and global leadership

1:12:44 - 1:16:18

Major tech companies like Amazon and Google are spending hundreds of billions on data centers. Total investment in American data center infrastructure is reaching nearly one trillion dollars this year. The success of this investment depends on the return on capital. Early revenue growth from companies like Anthropic suggests high returns. This remains true even though compute limits still slow their growth.

Dylan explains that China has not yet built the infrastructure scale needed to deploy models at this level. If AI revenue continues to compound quickly, the US economy might grow much faster than China's. This could lead to a significant economic divergence between the West and China.

The revenue is compounding at such a rate that it does affect the economic growth. The resources these labs are gathering are going so fast, and China hasn't done that yet. In that case, the US and the West are actually diverging.

There is another possibility. These massive infrastructure investments might produce low returns. While the US relies on a global supply chain across many countries, China is building a fully vertical indigenous supply chain. If AI takes a long time to reach its full potential, China could eventually scale past the West.

Fast timelines, US wins. Long timelines, China wins. I don't think you have to believe in AGI to have the timelines where the US wins.

The trade-offs between HBM and commodity DRAM

1:16:18 - 1:21:02

AI accelerators currently rely on High Bandwidth Memory, or HBM. It is made from DRAM but offers much more bandwidth. While it might seem efficient to switch to cheaper commodity DRAM to increase memory capacity for tasks that do not need to be fast, several factors prevent this shift. Dylan notes that the most valuable tasks are usually time sensitive. Even for tasks that take hours, users generally prefer faster speeds over the cost savings of a slow mode.

The metric that you actually care about is bandwidth per wafer, not bits per wafer. Because the thing that is constraining the flops is just getting in and out the next matrix.

One of the core physical constraints of a chip is the space available on its edges for data to move. Current designs use these edges for high speed connections to HBM. Switching to standard DRAM would provide more storage capacity. However, it would also drastically reduce the speed at which data travels. Designers must balance four main constraints: processing power, network bandwidth, memory bandwidth, and memory capacity. In many cases, the bottleneck is simply getting data in and out of the chip. If a system switched to slower memory, the expensive processing power would sit idle while waiting for data. This makes the trade-off unattractive for high performance AI.

The economic impact of the AI memory crunch

1:21:02 - 1:26:51

The current memory crunch is driven by the massive bandwidth needs of AI. HBM4 offers an order of magnitude more bandwidth than standard DDR5 memory within the same physical space. This efficiency is essential for AI compute but comes at a high cost. By 2026, memory could account for 30 percent of big tech capital expenditures. This shift creates a significant financial burden on the entire supply chain. Dylan explains that this trend will lead to more expensive and less capable consumer devices.

Memory crunch will continue to be harder and harder and prices continue to go up. This affects different parts of the market differently. People are going to hate AI more and more because now smartphones and PCs are not going to get incrementally better year on year. In fact, they are going to get incrementally worse.

The impact on the smartphone market is particularly severe. An iPhone could see a 150 dollar increase in its bill of materials as memory prices triple. While Apple might pass these costs to consumers, low-end and mid-range manufacturers like Xiaomi and Oppo have thinner margins. They are already cutting production volumes by half in some segments. Total global smartphone shipments could drop from over a billion units to 500 million as manufacturers struggle with rising costs.

The impact of AI on memory prices and the PC market

1:26:52 - 1:27:37

The rise of AI is drastically changing the memory market. Manufacturers are prioritizing AI chip makers over PC manufacturers. These AI companies sign longer contracts and pay higher margins because they can extract more value from their end users. This shift leaves fewer resources for traditional computers and gaming hardware.

DRAM gets released, goes to AI chips who are willing to do longer term contracts, willing to pay higher margins. This probably leads to people hating AI even more because memory prices have doubled and you can't get a new gaming GPU.

Consumer resentment toward AI is growing in the gaming community. People see memory prices doubling and blame AI applications for the shortage of GPUs and desktops. As memory prices continue to rise, this tension will likely get worse.

The physical constraints of memory production

1:27:37 - 1:35:12

Memory markets are shifting as NAND and DRAM prices both rise. While both are expanding capacity slowly, DRAM prices are expected to see larger increases. This is because smartphones and PCs consume a higher percentage of NAND. When consumer demand for these devices drops, more NAND is freed up for other markets. In contrast, the demand for DRAM in AI applications remains intense.

A major constraint in the industry is the lack of physical manufacturing space. Memory vendors stopped building new facilities in 2023 because prices were low and margins were thin. Now that demand has surged, the industry faces a significant delay. It takes about two years to build a new semiconductor fab. Consequently, meaningful new capacity may not be available until late 2027 or 2028.

Reasoning means long context, which means large KV cache, which means you need a lot of memory demand. It took a year for that to actually reflect in memory prices. Once memory prices reflected, then it took another six months for the memory vendors to start building fabs.

Dylan notes that AI requirements like long context windows drive this demand. Even if a leader like Elon Musk attempts to build a massive gigafab to produce a million wafers a month, the physical requirements are non-negotiable. While Elon is skilled at recruiting talent for ambitious projects, certain standards, like the extreme cleanliness of a fab, cannot be bypassed. In a functional fab, the air is replaced roughly every three seconds to maintain the environment necessary for chip production.

The future of 3D DRAM and lithography bottlenecks

1:35:12 - 1:41:49

Building a clean room is achievable in a year or two, but the real challenge lies in developing process technology and building wafers. This requires a massive amount of built-up knowledge. Currently, only companies like TSMC, Intel, and Samsung manage this level of complexity. While some might hope for a total disruption in lithography that replaces EUV with something simpler, the likelihood is very low. Most alternatives involve massive particle accelerators or synchrotrons to generate the necessary light, which are even more complicated to build.

The really complex part is actually developing the process technology and building wafers. I don't think he can develop that quickly. It has a lot of built up knowledge. It is the most complicated integration of very expensive tools and supply chain that is done.

There is a transition on the horizon for memory. The industry is moving toward 3D DRAM, which would allow for more layers and tighter vertical stacking. This shift would drastically increase the number of bits a single EUV pass can create. However, this change requires a complete retooling of fabs. You cannot easily convert a logic fab to a DRAM fab. Even moving between different generations of DRAM requires significant changes to the chemistry stacks and tool sets.

Lithography costs have historically trended upward as a percentage of total wafer cost. In 2014, it was around 16 percent, but it has grown to 30 percent recently. For DRAM, it is moving from the mid teens toward the 20s. If 3D DRAM becomes successful, this percentage might finally drop.

A major bottleneck in the industry is the supply of tools from manufacturers like ASML. In the energy sector, companies often place deposits on turbines years in advance to secure their spot. Some suggest that tech leaders should try to buy the rights to future EUV tools in a similar way. Dylan is skeptical that ASML would agree to such an arrangement. ASML likely prefers to maintain control over their margins and customer relationships rather than letting speculators flip spots in the queue.

Someone should go to the Netherlands and be like, I will pay you a billion dollars, you give me the right to purchase 10 EUV tools two years from now. Then you wait for everyone to realize they do not have enough tools and try and sell your option at a premium. But I do not think ASML would even agree to this.

Arbitraging semiconductor production capacity

1:41:49 - 1:42:33

The semiconductor market faces a significant gap between skyrocketing demand and limited production capacity. Companies like TSMC and ASML cannot ramp up their output fast enough to keep pace with current market needs. This creates a specific opportunity for arbitrage. If an investor realizes that demand is much higher than what these manufacturers are projecting, they can lock up future capacity through forward contracts.

If they can't increase production, just like TSMC cannot increase production that fast and yet demand is mooning, then the obvious solution is to arbitrage this. You arbitrage this by locking up the capacity and then doing like a forward contract and trying to sell it at a later date once other people realize actually shit, everything is fucked and we don't have enough capacity.

By securing this capacity early, one could sell it later at a much higher price once the rest of the market recognizes the severe shortage. This strategy captures the high margins that major manufacturers miss because they are not charging enough for their limited supply. However, it is unclear if companies like ASML and TSMC would ever agree to these types of contracts.

Meeting the power demands of AI data centers

1:42:34 - 1:47:33

Scaling power for AI requires more than just standard gas turbines. There is a significant gap between the power a plant generates and what a server actually uses. Losses happen during transmission and cooling. Because of this, the total energy capacity must be much higher than the actual IT load. Dylan explains that we can look to other industries for solutions. Airplane engines can be converted into turbines for data centers. Massive ship engines are already being used for this purpose in New Jersey.

In reality the nameplate capacity for energy is always way higher than the actual end critical IT capacity because of all of these factors.

Companies that build truck engines also have the capacity to scale up production for data centers. Fuel cells and solar power with battery storage offer more options. Another major opportunity lies in how we use the existing grid. The grid is built to handle the highest possible demand on the hottest days. This means a large portion of the grid remains idle most of the year. By using batteries or small power plants to handle those rare peaks, we can unlock that idle capacity for AI.

Today data centers are only 3.4 percent of the power of the US grid. And by 2028, they will be 10 percent. But if you can just unlock 20 percent of the US grid like this, it is not that crazy.

Solving power and labor shortages in data center construction

1:47:33 - 1:53:56

Building the infrastructure for modern AI requires overcoming significant engineering hurdles and taking risks with new technologies. Behind the meter gas generation is becoming a key strategy for powering data centers. While certain turbine components have lead times stretching beyond 2030, many alternative energy sources are available. Dylan notes that technologies like reciprocating engines, ship engines, and fuel cells can each provide tens of gigawatts of power. These energy supply chains are much simpler than the ones used for semiconductor manufacturing.

Any of these individually will do tens of gigawatts. And in a whole they will do hundreds of gigawatts.

Rising power costs are not a major concern for AI developers. Even if energy prices double, the total cost of ownership for high end hardware only increases by a small fraction. The value gained from faster model improvements far outweighs the extra cost of electricity. This allows for the use of energy technologies that might be more expensive than traditional utility grid power but are faster to deploy.

Labor remains a primary constraint for scaling data centers. Building a single large site can require thousands of workers. To solve this, the industry is moving toward modularization. Instead of building everything on site, complex systems are assembled in factories and shipped as integrated units. Dylan explains that these modules can include entire rows of servers with cooling and power systems already connected. This approach reduces the need for on site electricians and plumbers.

The main factor for reducing the number of people is modularizing things and making them in factories. You ship a fully integrated thing that has a lot of the cooling subsystems already put together.

Future hardware setups like the next generation Nvidia systems will feature much higher power density. These integrated skids will simplify installation. Instead of cabling individual racks, workers will connect larger blocks that arrive with networking and power already integrated. This shift helps bypass the shortage of skilled labor while speeding up the construction of massive computing facilities.

The role of modularization in data center expansion

1:53:56 - 1:54:39

Modularization is significantly reducing the number of people needed to work in data centers. This shift allows for a much larger building capacity. Companies like Crusoe, Google, and Meta are leading the way by adopting these modular designs, while others are slower to move. This variation in speed creates temporary dislocations in the market. Companies that move too quickly might face technical delays, while those that move too slowly often encounter labor problems.

This drastically can reduce the amount of people working in data centers, and therefore the capability to build these will be much larger.

The supply chain for these projects remains highly complex. However, these challenges are expected to be resolved through capitalism and human ingenuity within the necessary timeframes. The move toward modularization is a key part of solving the labor and scale issues currently facing the industry.

The logistical hurdles of space-based AI data centers

1:54:39 - 2:00:19

Building massive data centers on Earth faces hurdles like permitting and air pollution regulations, but these are often manageable. America has vast amounts of space in states like Texas and Wyoming where red tape is less of a factor. While some argue that moving to space is necessary due to power constraints or regulations, the reality is that energy is only a small fraction of the total cost of owning a data center. Most AI infrastructure remains in the United States because companies are finding ways to work through local requirements rather than seeking extreme alternatives.

Dylan points out that GPUs are notoriously unreliable. About 15 percent of new Blackwell GPUs require some form of physical maintenance or replacement after deployment. In a traditional data center, technicians can quickly swap parts or fix connections. If those same units are sent into space, the process of testing, deconstructing, shipping, and relaunching them could take six months or more. This delay is a major drawback because compute is most valuable at the beginning of its life cycle. Taking half a year to get a cluster online means losing a significant portion of its useful life and missing out on immediate revenue or model development.

The thing that separates these clouds is the ability to deploy and manage failure. If a GPU has a useful life of five years and it takes six additional months to get it into space, that is 10 percent of your cluster's useful life. Now is always the most important moment.

Technical constraints also extend to communication. While satellite technology like Starlink allows for fast data transfer between units, it still falls short of the requirements for modern AI clusters. High-end networking for GPUs needs bandwidth that is significantly higher than what current optical satellite links can provide. As hardware advances from the Hopper generation to Blackwell and beyond, these bandwidth requirements continue to double. This makes space-based communication a massive bottleneck for high-performance computing.

The technical and economic barriers to space data centers

2:00:19 - 2:06:19

Modern AI models are becoming more sparse, which requires running them across hundreds or thousands of chips. This creates a massive networking challenge. In a space-based data center, you would need complex interconnects to link different satellites. These systems are much more expensive and less reliable than the fiber optic transceivers used on Earth. While a cluster on land might spend fifteen percent of its cost on networking, a space-based version would require sophisticated lasers that are difficult to maintain.

Networking these chips together is a problem. You can't just make the satellite infinitely large because there are challenges with physics. Those interconnects are more expensive than a cluster on land. All of a sudden you're making it like space lasers instead of simple lasers manufactured in high volumes.

The real bottleneck for AI today is not energy or land, but the production of the chips themselves. The goal is to get every manufactured chip generating tokens as quickly as possible. Modularizing data centers on Earth allows for much faster deployment than launching hardware into orbit. Dylan suggests that space data centers might only make sense once the industry can produce enough chips and faces extreme energy constraints on Earth. He notes that space data centers are not yet the type of tenfold improvement that leaders like Elon Musk usually pursue.

All that matters in a chip constrained world is get these chips working on producing tokens ASAP. In space, higher watts per millimeter is very difficult, whereas on Earth, these are solved problems.

Thermal management also favors Earth. Increasing power density to get more performance requires exotic liquid or immersion cooling. These cooling methods are well-understood on the ground but become significantly more complex in a zero-gravity environment. Until land and energy become the primary limiting factors, the focus remains on terrestrial efficiency.

Comparing chip communication and scale up topologies

2:06:20 - 2:08:56

Communication speed drops significantly as distance increases. Within a chip or a rack, data transfers reach terabytes per second. Across a country, speed slows to gigabytes per second. The scale up domain refers to the high-speed zone where chips communicate at terabyte speeds. NVIDIA recently expanded its scale up domain from 8 GPUs in an H100 server to 72 GPUs in the Blackwell rack. These chips use an all-to-all connection. This means any chip can send data to any other chip at full speed without intermediate hops.

Dylan explains that Google uses a different approach for its TPUs. Their pods include thousands of chips, but they use a torus topology where each chip only connects to six neighbors. This creates a different set of constraints for how data moves through the system.

Google gets to have a massive scale up domain, but then they have the trade off of you have to bounce across chips to get from one chip to another. You can only talk to six direct neighbors.

Amazon falls somewhere between these two models by using switches and specific topologies. All three companies are now moving toward a dragonfly topology. This design combines fully connected elements with partially connected ones. It allows the scale up domain to include hundreds or thousands of chips without causing resource contention during data transfers.

The strategic trade-off in model parameter scaling

2:08:56 - 2:14:05

The slowdown in parameter scaling is often attributed to hardware constraints, specifically the memory capacity of Nvidia chips compared to Google's TPUs. While memory and bandwidth are factors, the real driver is the strategic allocation of compute resources between inference, development, and research. Research is where the most significant gains happen, often making model training ten times cheaper each year. To maintain this progress, labs prefer to dedicate the majority of their compute to research rather than just building the largest possible model.

Reinforcement learning creates a significant bottleneck for massive models. A five trillion parameter model requires far more time and compute for rollouts than a one trillion parameter model. Even if a larger model is more sample efficient, the smaller model can undergo twice as many rollouts in the same timeframe. This allows the model to be finished and deployed sooner.

In isolation, you almost always want to go with a smaller model that gets RL'd faster and gets deployed into research and development so you can build the next thing and get more compute efficiency wins.

Google can deploy larger models like Gemini Pro because they have a unified hardware environment with TPUs. This allows them to optimize for scale-up domains and speed up the feedback loop. However, for most labs, the goal is the fastest possible takeoff. This is achieved by creating smaller, smarter models quickly and using them to accelerate the next cycle of development. This compounding effect of doing research faster leads to a quicker overall advance in the technology.

How conviction and data interpretation drive investment success

2:14:06 - 2:18:25

Dylan provides data and analysis to a wide range of clients, including tech companies and hedge funds. While many people use his spreadsheets and reports, their success often depends on how they interpret the information. Most clients initially believe the projected numbers are too high. It often takes months of presenting facts and data before they realize the figures are actually correct.

Leopold is pretty much the only person who tells me my numbers are too low always. Everyone else tells me our numbers are too high almost ad nauseam.

The difference in success often comes down to conviction and identifying market inefficiencies. For example, predicting a memory crunch required a deep understanding of AI infrastructure. As context lengths for AI models grew, the need for memory exploded. A year ago, many people thought the idea of memory prices quadrupling while smartphone sales dropped was impossible. Those who trusted the data and understood the supply chain were able to make significant trades before the rest of the market caught up.

The shifting power balance between Apple and AI at TSMC

2:18:25 - 2:24:15

A significant shift is happening in the relationship between TSMC and its largest customers. Traditionally, Apple has been the first to move to new chip nodes and has enjoyed special treatment. However, AI accelerators are now taking priority. Companies like Nvidia, Amazon, and Google are willing to prepay for manufacturing capacity and capital expenditures. This change means Apple may lose the flexibility it once had to adjust order volumes based on seasonal demand.

Apple will become a smaller percentage of TSMC's revenue and therefore be less relevant for TSMC to cater to their demands. TSMC could eventually start saying you have to pre-book your capacity for next year and you have to prepay for the capex because that is what Nvidia and Amazon and Google are doing.

For the upcoming N2 node, Apple still holds a large portion of the capacity, but other players are catching up. AMD is taking a major risk by attempting to launch CPU and GPU chiplets on N2 in the same timeframe as Apple. By the time TSMC reaches the A16 node, the first customer is expected to be an AI company rather than Apple. This marks a turning point where Apple is no longer the undisputed leader in adopting new process technologies.

Dylan suggests that Huawei could have been a dominant force if trade restrictions had not intervened. Before the ban, Huawei was on track to eclipse Apple as TSMC's largest customer. Huawei possesses a unique combination of elite software engineering, networking expertise, and AI talent. This vertical integration might have allowed them to outperform Nvidia in certain areas.

It is very reasonable that if Huawei was not banned from using TSMC, they would have kept gaining share and they would likely be TSMC's biggest customer. Huawei is arguably the only company in the world that has all the legs. They have cracked software engineers, networking technologies, and AI talent.

The split compute model for humanoid robots

2:24:16 - 2:28:33

Humanoid robots will likely rely on a split compute model to function efficiently. High-level tasks such as long-horizon planning and complex object identification are better suited for the cloud. This allows massive models to process data in batches, which is far more efficient than running everything locally on the robot. The robot itself can then focus on immediate physical interpolation, such as adjusting force or grip based on the weight of an object it is picking up.

A lot of the planning and longer horizon tasks are determined by a much more capable model in the cloud that runs at very high batch sizes. And then it pushes those directions to the robots, who then interpolate between each subsequent action.

Power consumption is a major hurdle for mobile robotics. Leading-edge chips required for high-level intelligence consume significant power. If millions of humanoids performed all their processing on-device, it would drain batteries rapidly and pull massive amounts of compute away from AI data centers. Moving the heavy lifting to the cloud preserves the robot's battery life and allows for more sophisticated intelligence than a portable device could handle.

Dylan notes that this architecture leads to a future where intelligence is physically centralized in data centers rather than distributed in the heads of individual units. This shift explains strategic moves in the semiconductor supply chain. For example, Elon Musk is diversifying his chip production by working with Samsung in Texas. This strategy avoids the geopolitical risks associated with Taiwan and ensures he is not competing with every other AI company for capacity at TSMC.

Elon Musk recognizes this, which is why he's going to different places for his chips. He signed this massive deal with Samsung to make his robot chips in Texas because he thinks that Taiwan risk is huge. He gets this geopolitical diversification, but also supply chain diversity for his robots.

The strategic risk of Taiwan's semiconductor supply chain

2:28:33 - 2:30:43

The semiconductor supply chain faces a unique circular dependency. Many of the tools required to build chips actually contain semiconductors made in Taiwan. This creates a situation where the industry is like a snake eating its own tail. You cannot produce the necessary lithography tools without the chips, but you cannot make the chips without the tools. While there is some diversification, the concentration of manufacturing remains a significant bottleneck.

These tools actually use a lot of semiconductors which are manufactured in Taiwan. So it's like a snake eating its own tail sort of meme. Because you can't make the tools without the chips from Taiwan, which you can't use without the tools in Taiwan.

Simply airlifting engineers out of Taiwan during a conflict would not solve the problem. Dylan explains that even if you have the know-how, replicating the massive capacity of TSMC in a new location like Arizona would take years. Losing Taiwan would not just slow down growth. It would cause a massive contraction in global GDP. The ability to add new compute capacity would drop from hundreds of gigawatts per year to almost nothing. Existing capacity would remain, but it is small compared to the scale of current expansion efforts.